from:"maxgekk"

[spark] branch master updated: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`

2023-08-02 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f824d058b14 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to 
the error class `_LEGACY_ERROR_TEMP_2175`
f824d058b14 is described below

commit f824d058b14e3c58b1c90f64fefc45fac105c7dd
Author: Koray Beyaz 
AuthorDate: Thu Aug 3 10:57:26 2023 +0500

[SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class 
`_LEGACY_ERROR_TEMP_2175`

### What changes were proposed in this pull request?

- Rename _LEGACY_ERROR_TEMP_2175 as RULE_ID_NOT_FOUND

- Add a test case for the error class.

### Why are the changes needed?

We are migrating onto error classes

### Does this PR introduce _any_ user-facing change?

Yes, the error message will include the error class name

### How was this patch tested?

`testOnly *RuleIdCollectionSuite` and Github Actions

Closes #40991 from kori73/SPARK-42330.

Lead-authored-by: Koray Beyaz 
Co-authored-by: Koray Beyaz 
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/error-classes.json  | 11 ++-
 docs/sql-error-conditions.md  |  6 ++
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala|  5 ++---
 .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala   | 11 +++
 4 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index a9619b97bd9..20f2ab4eb24 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2471,6 +2471,12 @@
 ],
 "sqlState" : "42883"
   },
+  "RULE_ID_NOT_FOUND" : {
+"message" : [
+  "Not found an id for the rule name \"\". Please modify 
RuleIdCollection.scala if you are adding a new rule."
+],
+"sqlState" : "22023"
+  },
   "SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION" : {
 "message" : [
   "The correlated scalar subquery '' is neither present in GROUP 
BY, nor in an aggregate function. Add it to GROUP BY using ordinal position or 
wrap it in `first()` (or `first_value`) if you don't care which value you get."
@@ -5489,11 +5495,6 @@
   "."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2175" : {
-"message" : [
-  "Rule id not found for . Please modify RuleIdCollection.scala 
if you are adding a new rule."
-]
-  },
   "_LEGACY_ERROR_TEMP_2176" : {
 "message" : [
   "Cannot create array with  elements of data due to 
exceeding the limit  elements for ArrayData. 
"
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 161f3bdbef1..5609d60f974 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -1586,6 +1586,12 @@ The function `` cannot be found. Verify the 
spelling and correctnes
 If you did not qualify the name with a schema and catalog, verify the 
current_schema() output, or qualify the name with the correct schema and 
catalog.
 To tolerate the error on drop use DROP FUNCTION IF EXISTS.
 
+### RULE_ID_NOT_FOUND
+
+[SQLSTATE: 22023](sql-error-conditions-sqlstates.html#class-22-data-exception)
+
+Not found an id for the rule name "``". Please modify 
RuleIdCollection.scala if you are adding a new rule.
+
 ### SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION
 
 SQLSTATE: none assigned
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 3622ffebb74..45b5d6b6692 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -1584,9 +1584,8 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
 
   def ruleIdNotFoundForRuleError(ruleName: String): Throwable = {
 new SparkException(
-  errorClass = "_LEGACY_ERROR_TEMP_2175",
-  messageParameters = Map(
-"ruleName" -> ruleName),
+  errorClass = "RULE_ID_NOT_FOUND",
+  messageParameters = Map("ruleName" -> ruleName),
   cause = null)
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
index e70d04b7b5a..ae1c0a86a14 100644
--- 
a/sql/core/src/test/sca

[spark] branch branch-3.5 updated: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`

2023-08-02 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new a1ca1e6e763 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to 
the error class `_LEGACY_ERROR_TEMP_2175`
a1ca1e6e763 is described below

commit a1ca1e6e7633c3fbb36427a82635cda7d21f1dab
Author: Koray Beyaz 
AuthorDate: Thu Aug 3 10:57:26 2023 +0500

[SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class 
`_LEGACY_ERROR_TEMP_2175`

### What changes were proposed in this pull request?

- Rename _LEGACY_ERROR_TEMP_2175 as RULE_ID_NOT_FOUND

- Add a test case for the error class.

### Why are the changes needed?

We are migrating onto error classes

### Does this PR introduce _any_ user-facing change?

Yes, the error message will include the error class name

### How was this patch tested?

`testOnly *RuleIdCollectionSuite` and Github Actions

Closes #40991 from kori73/SPARK-42330.

Lead-authored-by: Koray Beyaz 
Co-authored-by: Koray Beyaz 
Signed-off-by: Max Gekk 
(cherry picked from commit f824d058b14e3c58b1c90f64fefc45fac105c7dd)
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/error-classes.json  | 11 ++-
 docs/sql-error-conditions.md  |  6 ++
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala|  5 ++---
 .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala   | 11 +++
 4 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index df425d7b2df..d9d1963c958 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2412,6 +2412,12 @@
 ],
 "sqlState" : "42883"
   },
+  "RULE_ID_NOT_FOUND" : {
+"message" : [
+  "Not found an id for the rule name \"\". Please modify 
RuleIdCollection.scala if you are adding a new rule."
+],
+"sqlState" : "22023"
+  },
   "SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION" : {
 "message" : [
   "The correlated scalar subquery '' is neither present in GROUP 
BY, nor in an aggregate function. Add it to GROUP BY using ordinal position or 
wrap it in `first()` (or `first_value`) if you don't care which value you get."
@@ -5425,11 +5431,6 @@
   "."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2175" : {
-"message" : [
-  "Rule id not found for . Please modify RuleIdCollection.scala 
if you are adding a new rule."
-]
-  },
   "_LEGACY_ERROR_TEMP_2176" : {
 "message" : [
   "Cannot create array with  elements of data due to 
exceeding the limit  elements for ArrayData. 
"
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 9e2a484d057..e1430e94db5 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -1578,6 +1578,12 @@ The function `` cannot be found. Verify the 
spelling and correctnes
 If you did not qualify the name with a schema and catalog, verify the 
current_schema() output, or qualify the name with the correct schema and 
catalog.
 To tolerate the error on drop use DROP FUNCTION IF EXISTS.
 
+### RULE_ID_NOT_FOUND
+
+[SQLSTATE: 22023](sql-error-conditions-sqlstates.html#class-22-data-exception)
+
+Not found an id for the rule name "``". Please modify 
RuleIdCollection.scala if you are adding a new rule.
+
 ### SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION
 
 SQLSTATE: none assigned
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 89c080409e2..7685e0f907c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -1584,9 +1584,8 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
 
   def ruleIdNotFoundForRuleError(ruleName: String): Throwable = {
 new SparkException(
-  errorClass = "_LEGACY_ERROR_TEMP_2175",
-  messageParameters = Map(
-"ruleName" -> ruleName),
+  errorClass = "RULE_ID_NOT_FOUND",
+  messageParameters = Map("ruleName" -> ruleName),
   cause = null)
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
 
b/sql/core/src/test/sc

[spark] branch master updated: [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic

2023-08-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1f10cc4a594 [SPARK-44628][SQL] Clear some unused codes in "***Errors" 
and extract some common logic
1f10cc4a594 is described below

commit 1f10cc4a59457ed0de0fd4dc0a1c61514d77261a
Author: panbingkun 
AuthorDate: Mon Aug 7 12:01:47 2023 +0500

[SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some 
common logic

### What changes were proposed in this pull request?
The pr aims to clear some unused codes in "***Errors" and extract some 
common logic.

### Why are the changes needed?
Make code clear.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #42238 from panbingkun/clear_error.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/errors/DataTypeErrors.scala   | 18 ++---
 .../apache/spark/sql/errors/QueryErrorsBase.scala  |  6 +-
 .../spark/sql/errors/QueryExecutionErrors.scala| 86 --
 3 files changed, 10 insertions(+), 100 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
index 7a34a386cd8..5e52e283338 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
@@ -192,15 +192,7 @@ private[sql] object DataTypeErrors extends 
DataTypeErrorsBase {
   decimalPrecision: Int,
   decimalScale: Int,
   context: SQLQueryContext = null): ArithmeticException = {
-new SparkArithmeticException(
-  errorClass = "NUMERIC_VALUE_OUT_OF_RANGE",
-  messageParameters = Map(
-"value" -> value.toPlainString,
-"precision" -> decimalPrecision.toString,
-"scale" -> decimalScale.toString,
-"config" -> toSQLConf("spark.sql.ansi.enabled")),
-  context = getQueryContext(context),
-  summary = getSummary(context))
+numericValueOutOfRange(value, decimalPrecision, decimalScale, context)
   }
 
   def cannotChangeDecimalPrecisionError(
@@ -208,6 +200,14 @@ private[sql] object DataTypeErrors extends 
DataTypeErrorsBase {
   decimalPrecision: Int,
   decimalScale: Int,
   context: SQLQueryContext = null): ArithmeticException = {
+numericValueOutOfRange(value, decimalPrecision, decimalScale, context)
+  }
+
+  private def numericValueOutOfRange(
+  value: Decimal,
+  decimalPrecision: Int,
+  decimalScale: Int,
+  context: SQLQueryContext): ArithmeticException = {
 new SparkArithmeticException(
   errorClass = "NUMERIC_VALUE_OUT_OF_RANGE",
   messageParameters = Map(
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
index db256fbee87..26600117a0c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.sql.errors
 
 import org.apache.spark.sql.catalyst.expressions.{Expression, Literal}
-import org.apache.spark.sql.catalyst.util.{toPrettySQL, QuotingUtils}
+import org.apache.spark.sql.catalyst.util.toPrettySQL
 import org.apache.spark.sql.types.{DataType, DoubleType, FloatType}
 
 /**
@@ -55,10 +55,6 @@ private[sql] trait QueryErrorsBase extends 
DataTypeErrorsBase {
 quoteByDefault(toPrettySQL(e))
   }
 
-  def toSQLSchema(schema: String): String = {
-QuotingUtils.toSQLSchema(schema)
-  }
-
   // Converts an error class parameter to its SQL representation
   def toSQLValue(v: Any, t: DataType): String = Literal.create(v, t) match {
 case Literal(null, _) => "NULL"
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 45b5d6b6692..f960a091ec0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -32,7 +32,6 @@ import org.apache.spark._
 import org.apache.spark.launcher.SparkLauncher
 import org.apache.spark.memory.SparkOutOfMemoryError
 import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.ScalaReflection.Schema
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.analysis.UnresolvedGenerator
 import org.ap

[spark] branch master updated (1f10cc4a594 -> f139733b92d)

2023-08-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1f10cc4a594 [SPARK-44628][SQL] Clear some unused codes in "***Errors" 
and extract some common logic
 add f139733b92d [SPARK-42321][SQL] Assign name to _LEGACY_ERROR_TEMP_2133

No new revisions were added by this update.

Summary of changes:
 .../utils/src/main/resources/error/error-classes.json | 10 +-
 ...ditions-malformed-record-in-parsing-error-class.md |  4 
 .../spark/sql/catalyst/json/JacksonParser.scala   |  8 
 .../spark/sql/catalyst/util/BadRecordException.scala  |  9 +
 .../spark/sql/catalyst/util/FailureSafeParser.scala   |  3 +++
 .../spark/sql/errors/QueryExecutionErrors.scala   | 19 ---
 .../spark/sql/errors/QueryExecutionErrorsSuite.scala  | 17 +
 7 files changed, 54 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38475][CORE] Use error class in org.apache.spark.serializer

2023-08-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a23c7a18a0 [SPARK-38475][CORE] Use error class in 
org.apache.spark.serializer
2a23c7a18a0 is described below

commit 2a23c7a18a0ba75d95ee1d898896a8f0dc2c5531
Author: Bo Zhang 
AuthorDate: Mon Aug 7 22:10:01 2023 +0500

[SPARK-38475][CORE] Use error class in org.apache.spark.serializer

### What changes were proposed in this pull request?
This PR aims to change exceptions created in package 
org.apache.spark.serializer to use error class.

### Why are the changes needed?
This is to move exceptions created in package org.apache.spark.serializer 
to error class.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #42243 from bozhang2820/spark-38475.

Lead-authored-by: Bo Zhang 
Co-authored-by: Bo Zhang 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 21 +
 .../spark/serializer/GenericAvroSerializer.scala   |  6 ++---
 .../apache/spark/serializer/KryoSerializer.scala   | 27 --
 docs/sql-error-conditions.md   | 24 +++
 4 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 680f787429c..0ea1eed35e4 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -831,6 +831,11 @@
   "Not found an encoder of the type  to Spark SQL internal 
representation. Consider to change the input type to one of supported at 
'/sql-ref-datatypes.html'."
 ]
   },
+  "ERROR_READING_AVRO_UNKNOWN_FINGERPRINT" : {
+"message" : [
+  "Error reading avro data -- encountered an unknown fingerprint: 
, not sure what schema to use. This could happen if you registered 
additional schemas after starting your spark context."
+]
+  },
   "EVENT_TIME_IS_NOT_ON_TIMESTAMP_TYPE" : {
 "message" : [
   "The event time  has the invalid type , but 
expected \"TIMESTAMP\"."
@@ -864,6 +869,11 @@
 ],
 "sqlState" : "22018"
   },
+  "FAILED_REGISTER_CLASS_WITH_KRYO" : {
+"message" : [
+  "Failed to register classes with Kryo."
+]
+  },
   "FAILED_RENAME_PATH" : {
 "message" : [
   "Failed to rename  to  as destination already 
exists."
@@ -1564,6 +1574,12 @@
 ],
 "sqlState" : "22032"
   },
+  "INVALID_KRYO_SERIALIZER_BUFFER_SIZE" : {
+"message" : [
+  "The value of the config \"\" must be less than 2048 
MiB, but got  MiB."
+],
+"sqlState" : "F"
+  },
   "INVALID_LAMBDA_FUNCTION_CALL" : {
 "message" : [
   "Invalid lambda function call."
@@ -2006,6 +2022,11 @@
   "The join condition  has the invalid type 
, expected \"BOOLEAN\"."
 ]
   },
+  "KRYO_BUFFER_OVERFLOW" : {
+"message" : [
+  "Kryo serialization failed: . To avoid this, increase 
\"\" value."
+]
+  },
   "LOAD_DATA_PATH_NOT_EXISTS" : {
 "message" : [
   "LOAD DATA input path does not exist: ."
diff --git 
a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala
index 7d2923fdf37..d09abff2773 100644
--- 
a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala
+++ 
b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala
@@ -140,9 +140,9 @@ private[serializer] class GenericAvroSerializer[D <: 
GenericContainer]
 case Some(s) => new 
Schema.Parser().setValidateDefaults(false).parse(s)
 case None =>
   throw new SparkException(
-"Error reading attempting to read avro data -- encountered an 
unknown " +
-  s"fingerprint: $fingerprint, not sure what schema to use.  
This could happen " +
-  "if you registered additional schemas after starting your 
spark context.")
+errorClass = "ERROR_READING_AVRO_UNKNOWN_FINGERPRINT",
+messageParameters = Map("fingerprint" -> fingerprint.toString),
+cause = null)
   }
 })
   } else {
diff --git 
a/core/src/main/scala/org/apache/spark/seriali

[spark] branch branch-3.5 updated: [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`

2023-08-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new b623c28f521 [SPARK-44680][SQL] Improve the error for parameters in 
`DEFAULT`
b623c28f521 is described below

commit b623c28f521e350b0f4bf15bfb911ca6bf0b1a80
Author: Max Gekk 
AuthorDate: Tue Aug 8 13:26:19 2023 +0500

[SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`

### What changes were proposed in this pull request?
In the PR, I propose to check that `DEFAULT` clause contains a parameter. 
If so, raise appropriate error about the feature is not supported. Currently, 
table creation with `DEFAULT` containing any parameters finishes successfully 
even parameters are not supported in such case:
```sql
scala>  spark.sql("CREATE TABLE t12(c1 int default :parm)", args = 
Map("parm" -> 5)).show()
++
||
++
++
scala>  spark.sql("describe t12");
org.apache.spark.sql.AnalysisException: 
[INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute EXISTS_DEFAULT 
command because the destination table column `c1` has a DEFAULT value :parm, 
which fails to resolve as a valid expression.
```

### Why are the changes needed?
This improves user experience with Spark SQL by saying about the root cause 
of the issue.

### Does this PR introduce _any_ user-facing change?
Yes. After the change, the table creation completes w/ the error:
```sql
scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = 
Map("parm" -> 5)).show()
org.apache.spark.sql.catalyst.parser.ParseException:
[UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT] The feature 
is not supported: Parameter markers are not allowed in DEFAULT.(line 1, pos 32)

== SQL ==
CREATE TABLE t12(c1 int default :parm)
^^^
```

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *ParametersSuite"
```

Closes #42365 from MaxGekk/fix-param-in-DEFAULT.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
(cherry picked from commit f7879b4c2500046cd7d889ba94adedd3000f8c41)
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 12 
 .../test/scala/org/apache/spark/sql/ParametersSuite.scala | 15 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 7a28efa3e42..83938632e53 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -40,6 +40,7 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._
 import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.trees.CurrentOrigin
+import org.apache.spark.sql.catalyst.trees.TreePattern.PARAMETER
 import org.apache.spark.sql.catalyst.types.DataTypeUtils
 import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, 
GeneratedColumn, IntervalUtils, ResolveDefaultColumns}
 import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, 
convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, 
stringToTimestamp, stringToTimestampWithoutTimeZone}
@@ -3130,9 +3131,12 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
 ctx.asScala.headOption.map(visitLocationSpec)
   }
 
-  private def verifyAndGetExpression(exprCtx: ExpressionContext): String = {
+  private def verifyAndGetExpression(exprCtx: ExpressionContext, place: 
String): String = {
 // Make sure it can be converted to Catalyst expressions.
-expression(exprCtx)
+val expr = expression(exprCtx)
+if (expr.containsPattern(PARAMETER)) {
+  throw QueryParsingErrors.parameterMarkerNotAllowed(place, expr.origin)
+}
 // Extract the raw expression text so that we can save the user provided 
text. We don't
 // use `Expression.sql` to avoid storing incorrect text caused by bugs in 
any expression's
 // `sql` method. Note: `exprCtx.getText` returns a string without spaces, 
so we need to
@@ -3147,7 +3151,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
*/
   override def visitDefaultExpression(ctx: DefaultExpressionContext): String =
 withOrigin(ctx) {
-  verifyAndGetExpression(ctx.expression())
+  verifyAndGetExpression(ctx.expression(), "DEFAULT")
 }
 
   /**
@@

[spark] branch master updated: [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`

2023-08-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f7879b4c250 [SPARK-44680][SQL] Improve the error for parameters in 
`DEFAULT`
f7879b4c250 is described below

commit f7879b4c2500046cd7d889ba94adedd3000f8c41
Author: Max Gekk 
AuthorDate: Tue Aug 8 13:26:19 2023 +0500

[SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`

### What changes were proposed in this pull request?
In the PR, I propose to check that `DEFAULT` clause contains a parameter. 
If so, raise appropriate error about the feature is not supported. Currently, 
table creation with `DEFAULT` containing any parameters finishes successfully 
even parameters are not supported in such case:
```sql
scala>  spark.sql("CREATE TABLE t12(c1 int default :parm)", args = 
Map("parm" -> 5)).show()
++
||
++
++
scala>  spark.sql("describe t12");
org.apache.spark.sql.AnalysisException: 
[INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute EXISTS_DEFAULT 
command because the destination table column `c1` has a DEFAULT value :parm, 
which fails to resolve as a valid expression.
```

### Why are the changes needed?
This improves user experience with Spark SQL by saying about the root cause 
of the issue.

### Does this PR introduce _any_ user-facing change?
Yes. After the change, the table creation completes w/ the error:
```sql
scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = 
Map("parm" -> 5)).show()
org.apache.spark.sql.catalyst.parser.ParseException:
[UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT] The feature 
is not supported: Parameter markers are not allowed in DEFAULT.(line 1, pos 32)

== SQL ==
CREATE TABLE t12(c1 int default :parm)
^^^
```

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *ParametersSuite"
```

Closes #42365 from MaxGekk/fix-param-in-DEFAULT.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 12 
 .../test/scala/org/apache/spark/sql/ParametersSuite.scala | 15 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 1b9dda51bf0..0635e6a1b44 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -40,6 +40,7 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._
 import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.trees.CurrentOrigin
+import org.apache.spark.sql.catalyst.trees.TreePattern.PARAMETER
 import org.apache.spark.sql.catalyst.types.DataTypeUtils
 import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, 
GeneratedColumn, IntervalUtils, ResolveDefaultColumns}
 import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, 
convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, 
stringToTimestamp, stringToTimestampWithoutTimeZone}
@@ -3153,9 +3154,12 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
 ctx.asScala.headOption.map(visitLocationSpec)
   }
 
-  private def verifyAndGetExpression(exprCtx: ExpressionContext): String = {
+  private def verifyAndGetExpression(exprCtx: ExpressionContext, place: 
String): String = {
 // Make sure it can be converted to Catalyst expressions.
-expression(exprCtx)
+val expr = expression(exprCtx)
+if (expr.containsPattern(PARAMETER)) {
+  throw QueryParsingErrors.parameterMarkerNotAllowed(place, expr.origin)
+}
 // Extract the raw expression text so that we can save the user provided 
text. We don't
 // use `Expression.sql` to avoid storing incorrect text caused by bugs in 
any expression's
 // `sql` method. Note: `exprCtx.getText` returns a string without spaces, 
so we need to
@@ -3170,7 +3174,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
*/
   override def visitDefaultExpression(ctx: DefaultExpressionContext): String =
 withOrigin(ctx) {
-  verifyAndGetExpression(ctx.expression())
+  verifyAndGetExpression(ctx.expression(), "DEFAULT")
 }
 
   /**
@@ -3178,7 +3182,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
*/
   over

[spark] branch master updated: [SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF`

2023-08-11 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b9fc5c03ed6 [SPARK-44778][SQL] Add the alias `TIMEDIFF` for 
`TIMESTAMPDIFF`
b9fc5c03ed6 is described below

commit b9fc5c03ed69e91d9c4cbe7ff5a1522c7b849568
Author: Max Gekk 
AuthorDate: Sat Aug 12 11:08:39 2023 +0500

[SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF`

### What changes were proposed in this pull request?
In the PR, I propose to extend the rules of `primaryExpression` in 
`SqlBaseParser.g4`, and one more function `TIMEDIFF` which accepts 3-args in 
the same way as the existing expressions `TIMESTAMPDIFF`.

### Why are the changes needed?
To achieve feature parity w/ other system and make the migration to Spark 
SQL from such systems easier:
1. Snowflake: https://docs.snowflake.com/en/sql-reference/functions/timediff
2. MySQL/MariaDB: 
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_timediff

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the existing test suites:
```
$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite"
```

Closes #42435 from MaxGekk/timediff.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 docs/sql-ref-ansi-compliance.md|  1 +
 .../spark/sql/catalyst/parser/SqlBaseLexer.g4  |  1 +
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |  4 +-
 .../analyzer-results/ansi/timestamp.sql.out| 68 ++
 .../analyzer-results/datetime-legacy.sql.out   | 68 ++
 .../sql-tests/analyzer-results/timestamp.sql.out   | 68 ++
 .../timestampNTZ/timestamp-ansi.sql.out| 70 +++
 .../timestampNTZ/timestamp.sql.out | 70 +++
 .../test/resources/sql-tests/inputs/timestamp.sql  |  8 +++
 .../sql-tests/results/ansi/keywords.sql.out|  1 +
 .../sql-tests/results/ansi/timestamp.sql.out   | 80 ++
 .../sql-tests/results/datetime-legacy.sql.out  | 80 ++
 .../resources/sql-tests/results/keywords.sql.out   |  1 +
 .../resources/sql-tests/results/timestamp.sql.out  | 80 ++
 .../results/timestampNTZ/timestamp-ansi.sql.out| 80 ++
 .../results/timestampNTZ/timestamp.sql.out | 80 ++
 .../ThriftServerWithSparkContextSuite.scala|  2 +-
 17 files changed, 760 insertions(+), 2 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index f3a0e8f9afb..09c38a00995 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -636,6 +636,7 @@ Below is a list of all the keywords in Spark SQL.
 |TERMINATED|non-reserved|non-reserved|non-reserved|
 |THEN|reserved|non-reserved|reserved|
 |TIME|reserved|non-reserved|reserved|
+|TIMEDIFF|non-reserved|non-reserved|non-reserved|
 |TIMESTAMP|non-reserved|non-reserved|non-reserved|
 |TIMESTAMP_LTZ|non-reserved|non-reserved|non-reserved|
 |TIMESTAMP_NTZ|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
index bf6370575a1..d9128de0f5d 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
@@ -373,6 +373,7 @@ TEMPORARY: 'TEMPORARY' | 'TEMP';
 TERMINATED: 'TERMINATED';
 THEN: 'THEN';
 TIME: 'TIME';
+TIMEDIFF: 'TIMEDIFF';
 TIMESTAMP: 'TIMESTAMP';
 TIMESTAMP_LTZ: 'TIMESTAMP_LTZ';
 TIMESTAMP_NTZ: 'TIMESTAMP_NTZ';
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
index a45ebee3106..7a69b10dadb 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
@@ -953,7 +953,7 @@ datetimeUnit
 primaryExpression
 : name=(CURRENT_DATE | CURRENT_TIMESTAMP | CURRENT_USER | USER)
   #currentLike
 | name=(TIMESTAMPADD | DATEADD | DATE_ADD) LEFT_PAREN (unit=datetimeUnit | 
invalidUnit=stringLit) COMMA unitsAmount=valueExpression COMMA 
timestamp=valueExpression RIGHT_PAREN #timestampadd
-| name=(TIMESTAMPDIFF | DATEDIFF | DATE_DIFF) LEFT_PAREN 
(unit=datetimeUnit | invalidUnit=stringLit) COMMA 
startTimestamp=valueExpression COM

[spark] branch master updated: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-08-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 295c615b16b [SPARK-44404][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]
295c615b16b is described below

commit 295c615b16b8a77f242ffa99006b4fb95f8f3487
Author: panbingkun 
AuthorDate: Sat Aug 12 12:22:28 2023 +0500

[SPARK-44404][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

### What changes were proposed in this pull request?
The pr aims to assign names to the error class, include:
- _LEGACY_ERROR_TEMP_1009 => VIEW_EXCEED_MAX_NESTED_DEPTH
- _LEGACY_ERROR_TEMP_1010 => UNSUPPORTED_VIEW_OPERATION.WITHOUT_SUGGESTION
- _LEGACY_ERROR_TEMP_1013 => UNSUPPORTED_VIEW_OPERATION.WITH_SUGGESTION / 
UNSUPPORTED_TEMP_VIEW_OPERATION.WITH_SUGGESTION
- _LEGACY_ERROR_TEMP_1014 => 
UNSUPPORTED_TEMP_VIEW_OPERATION.WITHOUT_SUGGESTION
- _LEGACY_ERROR_TEMP_1015 => UNSUPPORTED_TABLE_OPERATION.WITH_SUGGESTION
- _LEGACY_ERROR_TEMP_1016 => 
UNSUPPORTED_TEMP_VIEW_OPERATION.WITHOUT_SUGGESTION
- _LEGACY_ERROR_TEMP_1278 => UNSUPPORTED_TABLE_OPERATION.WITHOUT_SUGGESTION

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test.
- Update UT.

Closes #42109 from panbingkun/SPARK-44404.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun <84731...@qq.com>
Signed-off-by: Max Gekk 
---
 R/pkg/tests/fulltests/test_sparkSQL.R  |   3 +-
 .../src/main/resources/error/error-classes.json|  91 ---
 ...ions-unsupported-table-operation-error-class.md |  36 +++
 ...-unsupported-temp-view-operation-error-class.md |  36 +++
 ...tions-unsupported-view-operation-error-class.md |  36 +++
 docs/sql-error-conditions.md   |  30 +++
 .../spark/sql/catalyst/analysis/Analyzer.scala |   9 +-
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  |   4 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  32 ++-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  90 ---
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 104 
 .../apache/spark/sql/execution/command/views.scala |   2 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   2 +-
 .../analyzer-results/change-column.sql.out |  16 +-
 .../sql-tests/results/change-column.sql.out|  16 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |   7 +-
 .../apache/spark/sql/execution/SQLViewSuite.scala  | 267 ++---
 .../spark/sql/execution/SQLViewTestSuite.scala |  23 +-
 .../AlterTableAddPartitionParserSuite.scala|   4 +-
 .../AlterTableDropPartitionParserSuite.scala   |   8 +-
 .../AlterTableRecoverPartitionsParserSuite.scala   |   8 +-
 .../AlterTableRenamePartitionParserSuite.scala |   4 +-
 .../command/AlterTableSetLocationParserSuite.scala |   6 +-
 .../command/AlterTableSetSerdeParserSuite.scala|  16 +-
 .../spark/sql/execution/command/DDLSuite.scala |  36 ++-
 .../command/MsckRepairTableParserSuite.scala   |  13 +-
 .../command/ShowPartitionsParserSuite.scala|  10 +-
 .../command/TruncateTableParserSuite.scala |   6 +-
 .../execution/command/TruncateTableSuiteBase.scala |  45 +++-
 .../execution/command/v1/ShowPartitionsSuite.scala |  57 -
 .../apache/spark/sql/internal/CatalogSuite.scala   |  13 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  94 +++-
 32 files changed, 717 insertions(+), 407 deletions(-)

diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index d61501d248a..47688d7560c 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -4193,8 +4193,7 @@ test_that("catalog APIs, listTables, getTable, 
listColumns, listFunctions, funct
 
   # recoverPartitions does not work with temporary view
   expect_error(recoverPartitions("cars"),
-   paste("Error in recoverPartitions : analysis error - cars is a 
temp view.",
- "'recoverPartitions()' expects a table"), fixed = TRUE)
+   "[UNSUPPORTED_TEMP_VIEW_OPERATION.WITH_SUGGESTION]*`cars`*")
   expect_error(refreshTable("cars"), NA)
   expect_error(refreshByPath("/"), NA)
 
diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 133c2dd826c..08f79bcecbb 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -3394,12 +3394,63 @@

[spark] branch master updated: [SPARK-43780][SQL][FOLLOWUP] Fix the config doc `spark.sql.optimizer.decorrelateJoinPredicate.enabled`

2023-08-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 24293cab2de [SPARK-43780][SQL][FOLLOWUP] Fix the config doc 
`spark.sql.optimizer.decorrelateJoinPredicate.enabled`
24293cab2de is described below

commit 24293cab2de06a50ffd9f4871073e75481665bb8
Author: Max Gekk 
AuthorDate: Tue Aug 22 15:32:32 2023 +0300

[SPARK-43780][SQL][FOLLOWUP] Fix the config doc 
`spark.sql.optimizer.decorrelateJoinPredicate.enabled`

### What changes were proposed in this pull request?
Add s" to the doc of the SQL config 
`spark.sql.optimizer.decorrelateJoinPredicate.enabled`.

### Why are the changes needed?
To output the desired config name.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running CI.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42607 from MaxGekk/followup-agubichev_spark-43780-corr-predicate.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 9b421251cf6..ca155683ec0 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -4363,7 +4363,7 @@ object SQLConf {
   .internal()
   .doc("Decorrelate scalar and lateral subqueries with correlated 
references in join " +
 "predicates. This configuration is only effective when " +
-"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.")
+s"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.")
   .version("4.0.0")
   .booleanConf
   .createWithDefault(true)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.4 updated: [SPARK-44871][SQL][3.4] Fix percentile_disc behaviour

2023-08-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 0060279f733 [SPARK-44871][SQL][3.4] Fix percentile_disc behaviour
0060279f733 is described below

commit 0060279f733989b03aca2bbb0624dfc0c3193aae
Author: Peter Toth 
AuthorDate: Tue Aug 22 19:27:15 2023 +0300

[SPARK-44871][SQL][3.4] Fix percentile_disc behaviour

### What changes were proposed in this pull request?
This PR fixes `percentile_disc()` function as currently it returns 
inforrect results in some cases. E.g.:
```
SELECT
  percentile_disc(0.0) WITHIN GROUP (ORDER BY a) as p0,
  percentile_disc(0.1) WITHIN GROUP (ORDER BY a) as p1,
  percentile_disc(0.2) WITHIN GROUP (ORDER BY a) as p2,
  percentile_disc(0.3) WITHIN GROUP (ORDER BY a) as p3,
  percentile_disc(0.4) WITHIN GROUP (ORDER BY a) as p4,
  percentile_disc(0.5) WITHIN GROUP (ORDER BY a) as p5,
  percentile_disc(0.6) WITHIN GROUP (ORDER BY a) as p6,
  percentile_disc(0.7) WITHIN GROUP (ORDER BY a) as p7,
  percentile_disc(0.8) WITHIN GROUP (ORDER BY a) as p8,
  percentile_disc(0.9) WITHIN GROUP (ORDER BY a) as p9,
  percentile_disc(1.0) WITHIN GROUP (ORDER BY a) as p10
FROM VALUES (0), (1), (2), (3), (4) AS v(a)
```
currently returns:
```
+---+---+---+---+---+---+---+---+---+---+---+
| p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10|
+---+---+---+---+---+---+---+---+---+---+---+
|0.0|0.0|0.0|1.0|1.0|2.0|2.0|2.0|3.0|3.0|4.0|
+---+---+---+---+---+---+---+---+---+---+---+
```
but after this PR it returns the correct:
```
+---+---+---+---+---+---+---+---+---+---+---+
| p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10|
+---+---+---+---+---+---+---+---+---+---+---+
|0.0|0.0|0.0|1.0|1.0|2.0|2.0|3.0|3.0|4.0|4.0|
+---+---+---+---+---+---+---+---+---+---+---+
```

### Why are the changes needed?
Bugfix.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness bug, but the old behaviour can be restored with 
`spark.sql.legacy.percentileDiscCalculation=true`.

### How was this patch tested?
Added new UTs.

Closes #42610 from peter-toth/SPARK-44871-fix-percentile-disc-behaviour-3.4.

Authored-by: Peter Toth 
Signed-off-by: Max Gekk 
---
 .../expressions/aggregate/percentiles.scala|  39 +--
 .../org/apache/spark/sql/internal/SQLConf.scala|  10 ++
 .../resources/sql-tests/inputs/percentiles.sql |  77 +-
 .../sql-tests/results/percentiles.sql.out  | 116 +
 4 files changed, 234 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala
index 8447a5f9b51..da04c5a1c8a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala
@@ -27,6 +27,7 @@ import org.apache.spark.sql.catalyst.expressions.Cast._
 import org.apache.spark.sql.catalyst.trees.{BinaryLike, TernaryLike, UnaryLike}
 import org.apache.spark.sql.catalyst.util._
 import org.apache.spark.sql.errors.QueryExecutionErrors
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.types.TypeCollection.NumericAndAnsiInterval
 import org.apache.spark.util.collection.OpenHashMap
@@ -168,11 +169,8 @@ abstract class PercentileBase
 val accumulatedCounts = sortedCounts.scanLeft((sortedCounts.head._1, 0L)) {
   case ((key1, count1), (key2, count2)) => (key2, count1 + count2)
 }.tail
-val maxPosition = accumulatedCounts.last._2 - 1
 
-percentages.map { percentile =>
-  getPercentile(accumulatedCounts, maxPosition * percentile)
-}
+percentages.map(getPercentile(accumulatedCounts, _))
   }
 
   private def generateOutput(percentiles: Seq[Double]): Any = {
@@ -195,8 +193,11 @@ abstract class PercentileBase
* This function has been based upon similar function from HIVE
* `org.apache.hadoop.hive.ql.udf.UDAFPercentile.getPercentile()`.
*/
-  private def getPercentile(
-  accumulatedCounts: Seq[(AnyRef, Long)], position: Double): Double = {
+  protected def getPercentile(
+  accumulatedCounts: Seq[(AnyRef, Long)],
+  percentile: Double): Double = {
+val position = (accumulatedCounts.last._2 - 1) * percentile
+
 // We may need to do linear interpolation to get the exact percentile
 val lower = position.floor.toLong
 val higher = position.ceil.toLong
@@ -219,6 +220,7 @@ abstract class PercentileBase
 }
 
 if (discrete) {
+

[spark] branch master updated: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

2023-08-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ce50a563d31 [SPARK-44840][SQL] Make `array_insert()` 1-based for 
negative indexes
ce50a563d31 is described below

commit ce50a563d311ccfe36d1fcc4f0743e4e4d7d8116
Author: Max Gekk 
AuthorDate: Tue Aug 22 21:04:32 2023 +0300

[SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

### What changes were proposed in this pull request?
In the PR, I propose to make the `array_insert` function 1-based for 
negative indexes. So, the maximum index is -1 should point out to the last 
element, and the function should insert new element at the end of the given 
array for the index -1.

The old behaviour can be restored via the SQL config 
`spark.sql.legacy.negativeIndexInArrayInsert`.

### Why are the changes needed?
1.  To match the behaviour of functions such as `substr()` and 
`element_at()`.
```sql
spark-sql (default)> select element_at(array('a', 'b'), -1), substr('ab', 
-1);
b   b
```
2. To fix an inconsistency in `array_insert` in which positive indexes are 
1-based, but negative indexes are 0-based.

### Does this PR introduce _any_ user-facing change?
Yes.

Before:
```sql
spark-sql (default)> select array_insert(array('a', 'b'), -1, 'c');
["a","c","b"]
```

After:
```sql
spark-sql (default)> select array_insert(array('a', 'b'), -1, 'c');
["a","b","c"]
```

### How was this patch tested?
By running the modified test suite:
```
$ build/sbt "test:testOnly *CollectionExpressionsSuite"
$ build/sbt "test:testOnly *DataFrameFunctionsSuite"
$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite"
```

Closes #42564 from MaxGekk/fix-array_insert.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../explain-results/function_array_insert.explain  |  2 +-
 .../explain-results/function_array_prepend.explain |  2 +-
 docs/sql-migration-guide.md|  1 +
 python/pyspark/sql/functions.py|  2 +-
 .../expressions/collectionOperations.scala | 37 ++--
 .../org/apache/spark/sql/internal/SQLConf.scala| 16 +++
 .../expressions/CollectionExpressionsSuite.scala   | 50 +++---
 .../scala/org/apache/spark/sql/functions.scala |  2 +-
 .../sql-tests/analyzer-results/ansi/array.sql.out  | 44 +++
 .../sql-tests/analyzer-results/array.sql.out   | 44 +++
 .../src/test/resources/sql-tests/inputs/array.sql  |  5 +++
 .../resources/sql-tests/results/ansi/array.sql.out | 34 ++-
 .../test/resources/sql-tests/results/array.sql.out | 34 ++-
 .../apache/spark/sql/DataFrameFunctionsSuite.scala |  6 ++-
 14 files changed, 218 insertions(+), 61 deletions(-)

diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_insert.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_insert.explain
index edcd790596b..f5096a363a3 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_insert.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_insert.explain
@@ -1,2 +1,2 @@
-Project [array_insert(e#0, 0, 1) AS array_insert(e, 0, 1)#0]
+Project [array_insert(e#0, 0, 1, false) AS array_insert(e, 0, 1)#0]
 +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_prepend.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_prepend.explain
index 4c3e7c85d64..1b20682b09d 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_prepend.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_prepend.explain
@@ -1,2 +1,2 @@
-Project [array_insert(e#0, 1, 1) AS array_prepend(e, 1)#0]
+Project [array_insert(e#0, 1, 1, false) AS array_prepend(e, 1)#0]
 +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index c71b16cd8d6..5fc323ec1b0 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -29,6 +29,7 @@ license: |
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from

[spark] branch branch-3.3 updated (352810b2b45 -> aa6f6f74dc9)

2023-08-23 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


from 352810b2b45 [SPARK-44920][CORE] Use await() instead of 
awaitUninterruptibly() in TransportClientFactory.createClient()
 add aa6f6f74dc9 [SPARK-44871][SQL][3.3] Fix percentile_disc behaviour

No new revisions were added by this update.

Summary of changes:
 .../expressions/aggregate/percentiles.scala|  39 +--
 .../org/apache/spark/sql/internal/SQLConf.scala|  10 ++
 .../resources/sql-tests/inputs/percentiles.sql |  74 +
 .../sql-tests/results/percentiles.sql.out  | 118 +
 4 files changed, 234 insertions(+), 7 deletions(-)
 create mode 100644 sql/core/src/test/resources/sql-tests/inputs/percentiles.sql
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/percentiles.sql.out


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44975][SQL] Remove BinaryArithmetic useless override resolved

2023-08-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 04339e30dbd [SPARK-44975][SQL] Remove BinaryArithmetic useless 
override resolved
04339e30dbd is described below

commit 04339e30dbdda2805edbac7e1e3cd8dfb5c3c608
Author: Jia Fan 
AuthorDate: Sat Aug 26 21:11:20 2023 +0300

[SPARK-44975][SQL] Remove BinaryArithmetic useless override resolved

### What changes were proposed in this pull request?
Remove `BinaryArithmetic` useless override resolved, it is exactly the same 
as the abstract class `Expression`

### Why are the changes needed?
remove useless logic

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
exist test

### Was this patch authored or co-authored using generative AI tooling?

Closes #42689 from Hisoka-X/SPARK-44975_remove_resolved_override.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala| 2 --
 1 file changed, 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 31d4d71cd40..2d9bccc0854 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -264,8 +264,6 @@ abstract class BinaryArithmetic extends BinaryOperator
 
   final override val nodePatterns: Seq[TreePattern] = Seq(BINARY_ARITHMETIC)
 
-  override lazy val resolved: Boolean = childrenResolved && 
checkInputDataTypes().isSuccess
-
   override def initQueryContext(): Option[SQLQueryContext] = {
 if (failOnError) {
   Some(origin.context)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44983][SQL] Convert binary to string by `to_char` for the formats: `hex`, `base64`, `utf-8`

2023-08-28 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4946d025b62 [SPARK-44983][SQL] Convert binary to string by `to_char` 
for the formats: `hex`, `base64`, `utf-8`
4946d025b62 is described below

commit 4946d025b6200ad90dfdfbb1f24526016f810523
Author: Max Gekk 
AuthorDate: Mon Aug 28 16:55:35 2023 +0300

[SPARK-44983][SQL] Convert binary to string by `to_char` for the formats: 
`hex`, `base64`, `utf-8`

### What changes were proposed in this pull request?
In the PR, I propose to re-use the `Hex`, `Base64` and `Decode` expressions 
in the `ToCharacter` (the `to_char`/`to_varchar` functions) when the `format` 
parameter is one of `hex`, `base64` and `utf-8`.

### Why are the changes needed?
To make the migration to Spark SQL easier from the systems like:
- Snowflake: https://docs.snowflake.com/en/sql-reference/functions/to_char
- SAP SQL Anywhere: 
https://help.sap.com/docs/SAP_SQL_Anywhere/93079d4ba8e44920ae63ffb4def91f5b/81fe51196ce21014b9c6cf43b298.html
- Oracle: 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_CHAR-number.html#GUID-00DA076D-2468-41AB-A3AC-CC78DBA0D9CB
- Vertica: 
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TO_CHAR.htm

### Does this PR introduce _any_ user-facing change?
No. This PR extends existing API. It might be considered as an user-facing 
change only if user's code depends on errors in the case of wrong formats.

### How was this patch tested?
By running new examples:
```
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.expressions.ExpressionInfoSuite"
```
and new tests:
```
$ build/sbt "test:testOnly *.StringFunctionsSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.
    
Closes #42632 from MaxGekk/to_char-binary-2.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  5 ++
 ...nditions-invalid-parameter-value-error-class.md |  4 ++
 .../expressions/numberFormatExpressions.scala  | 28 +++--
 .../spark/sql/errors/QueryCompilationErrors.scala  |  9 +++
 .../apache/spark/sql/StringFunctionsSuite.scala| 69 +++---
 5 files changed, 89 insertions(+), 26 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 632c449b992..53c596c00fc 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1788,6 +1788,11 @@
   "expects a binary value with 16, 24 or 32 bytes, but got 
 bytes."
 ]
   },
+  "BINARY_FORMAT" : {
+"message" : [
+  "expects one of binary formats 'base64', 'hex', 'utf-8', but got 
."
+]
+  },
   "DATETIME_UNIT" : {
 "message" : [
   "expects one of the units without quotes YEAR, QUARTER, MONTH, WEEK, 
DAY, DAYOFYEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, but got the 
string literal ."
diff --git a/docs/sql-error-conditions-invalid-parameter-value-error-class.md 
b/docs/sql-error-conditions-invalid-parameter-value-error-class.md
index 370e6da3362..96829e564aa 100644
--- a/docs/sql-error-conditions-invalid-parameter-value-error-class.md
+++ b/docs/sql-error-conditions-invalid-parameter-value-error-class.md
@@ -37,6 +37,10 @@ supports 16-byte CBC IVs and 12-byte GCM IVs, but got 
`` bytes for
 
 expects a binary value with 16, 24 or 32 bytes, but got `` bytes.
 
+## BINARY_FORMAT
+
+expects one of binary formats 'base64', 'hex', 'utf-8', but got 
``.
+
 ## DATETIME_UNIT
 
 expects one of the units without quotes YEAR, QUARTER, MONTH, WEEK, DAY, 
DAYOFYEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, but got the string 
literal ``.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala
index 3a424ac21c5..7875ed8fe20 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala
@@ -26,7 +26,7 @@ import 
org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGe
 import org.apache.spark.sql.catalyst.expressions.codegen.Block.BlockHelper
 import org.apache.spark.sql.catalyst.util.ToNumberParser
 import org.apache.spark

[spark] branch master updated: [SPARK-44868][SQL][FOLLOWUP] Invoke the `to_varchar` function in Scala API

2023-08-28 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 49da438ece8 [SPARK-44868][SQL][FOLLOWUP] Invoke the `to_varchar` 
function in Scala API
49da438ece8 is described below

commit 49da438ece84391db22f9c56e747d555d9b01969
Author: Max Gekk 
AuthorDate: Mon Aug 28 20:57:27 2023 +0300

[SPARK-44868][SQL][FOLLOWUP] Invoke the `to_varchar` function in Scala API

### What changes were proposed in this pull request?
In the PR, I propose to invoke the `to_varchar` function instead of 
`to_char` in `to_varchar` of Scala/Java API.

### Why are the changes needed?
1. To show correct function name in error messages and in `explain`.
2. To be consistent to other API: PySpark and the previous Spark SQL 
version 3.5.0.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the modified test:
```
$ build/sbt "test:testOnly *.StringFunctionsSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42703 from MaxGekk/fix-to_varchar-call.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala  | 2 +-
 .../src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index f6699b66af9..6b474c84cdb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4431,7 +4431,7 @@ object functions {
* @group string_funcs
* @since 3.5.0
*/
-  def to_varchar(e: Column, format: Column): Column = to_char(e, format)
+  def to_varchar(e: Column, format: Column): Column = 
call_function("to_varchar", e, format)
 
   /**
* Convert string 'e' to a number based on the string format 'format'.
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
index 12881f4a22a..03b9053c71a 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
@@ -878,7 +878,7 @@ class StringFunctionsSuite extends QueryTest with 
SharedSparkSession {
 errorClass = "_LEGACY_ERROR_TEMP_1100",
 parameters = Map(
   "argName" -> "format",
-  "funcName" -> "to_char",
+  "funcName" -> funcName,
   "requiredType" -> "string"))
   checkError(
 exception = intercept[AnalysisException] {
@@ -887,7 +887,7 @@ class StringFunctionsSuite extends QueryTest with 
SharedSparkSession {
 errorClass = "INVALID_PARAMETER_VALUE.BINARY_FORMAT",
 parameters = Map(
   "parameter" -> "`format`",
-  "functionName" -> "`to_char`",
+  "functionName" -> s"`$funcName`",
   "invalidFormat" -> "'invalid_format'"))
   checkError(
 exception = intercept[AnalysisException] {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8505084bc26 -> a7eef211691)

2023-08-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8505084bc26 [SPARK-45003][PYTHON][DOCS] Refine docstring of `asc/desc`
 add a7eef211691 [SPARK-43438][SQL] Error on missing input columns in 
`INSERT`

No new revisions were added by this update.

Summary of changes:
 .../catalyst/analysis/TableOutputResolver.scala| 15 +-
 .../spark/sql/execution/datasources/rules.scala|  6 ++-
 .../spark/sql/ResolveDefaultColumnsSuite.scala | 59 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 18 ---
 .../org/apache/spark/sql/hive/InsertSuite.scala|  2 +-
 .../spark/sql/hive/execution/HiveQuerySuite.scala  |  6 +--
 6 files changed, 69 insertions(+), 37 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-43438][SQL] Error on missing input columns in `INSERT`

2023-08-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 24bd29cc56a [SPARK-43438][SQL] Error on missing input columns in 
`INSERT`
24bd29cc56a is described below

commit 24bd29cc56a7e12a45d713b5ca0bf2205b80a8f6
Author: Max Gekk 
AuthorDate: Tue Aug 29 23:04:44 2023 +0300

[SPARK-43438][SQL] Error on missing input columns in `INSERT`

### What changes were proposed in this pull request?
In the PR, I propose to raise an error when an user uses V1 `INSERT` 
without a list of columns, and the number of inserting columns doesn't match to 
the number of actual table columns.

At the moment Spark inserts data successfully in such case after the PR 
https://github.com/apache/spark/pull/41262 which changed the behaviour of Spark 
3.4.x.

### Why are the changes needed?
1. To conform the SQL standard which requires the number of columns must be 
the same:
![Screenshot 2023-08-07 at 11 01 27 
AM](https://github.com/apache/spark/assets/1580697/c55badec-5716-490f-a83a-0bb6b22c84c7)

Apparently, the insertion below must not succeed:
```sql
spark-sql (default)> CREATE TABLE tabtest(c1 INT, c2 INT);
spark-sql (default)> INSERT INTO tabtest SELECT 1;
```

2. To have the same behaviour as **Spark 3.4**:
```sql
spark-sql (default)> INSERT INTO tabtest SELECT 1;
`spark_catalog`.`default`.`tabtest` requires that the data to be inserted 
have the same number of columns as the target table: target table has 2 
column(s) but the inserted data has 1 column(s), including 0 partition 
column(s) having constant value(s).
```

### Does this PR introduce _any_ user-facing change?
Yes.

After the changes:
```sql
spark-sql (default)> INSERT INTO tabtest SELECT 1;
[INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to 
`spark_catalog`.`default`.`tabtest`, the reason is not enough data columns:
Table columns: `c1`, `c2`.
Data columns: `1`.
```

### How was this patch tested?
By running the modified tests:
```
$ build/sbt "test:testOnly *InsertSuite"
$ build/sbt "test:testOnly *ResolveDefaultColumnsSuite"
$ build/sbt -Phive "test:testOnly *HiveQuerySuite"
```

Closes #42393 from MaxGekk/fix-num-cols-insert.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
(cherry picked from commit a7eef2116919bd0c1a1b52adaf49de903e8c9c46)
Signed-off-by: Max Gekk 
---
 .../catalyst/analysis/TableOutputResolver.scala| 15 +-
 .../spark/sql/execution/datasources/rules.scala|  6 ++-
 .../spark/sql/ResolveDefaultColumnsSuite.scala | 59 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 18 ---
 .../org/apache/spark/sql/hive/InsertSuite.scala|  2 +-
 .../spark/sql/hive/execution/HiveQuerySuite.scala  |  6 +--
 6 files changed, 69 insertions(+), 37 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
index 894cd0b3991..6671836b351 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
@@ -65,22 +65,11 @@ object TableOutputResolver {
 errors += _,
 fillDefaultValue = supportColDefaultValue)
 } else {
-  // If the target table needs more columns than the input query, fill 
them with
-  // the columns' default values, if the `supportColDefaultValue` 
parameter is true.
-  val fillDefaultValue = supportColDefaultValue && actualExpectedCols.size 
> query.output.size
-  val queryOutputCols = if (fillDefaultValue) {
-query.output ++ actualExpectedCols.drop(query.output.size).flatMap { 
expectedCol =>
-  getDefaultValueExprOrNullLit(expectedCol, 
conf.useNullsForMissingDefaultColumnValues)
-}
-  } else {
-query.output
-  }
-  if (actualExpectedCols.size > queryOutputCols.size) {
+  if (actualExpectedCols.size > query.output.size) {
 throw QueryCompilationErrors.cannotWriteNotEnoughColumnsToTableError(
   tableName, actualExpectedCols.map(_.name), query)
   }
-
-  resolveColumnsByPosition(tableName, queryOutputCols, actualExpectedCols, 
conf, errors += _)
+  resolveColumnsByPosition(tableName, query.output, actualExpectedCols, 
conf, errors += _)
 }
 
 if (errors.nonEmpty) {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execut

[spark] branch master updated: [SPARK-44987][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1100`

2023-08-31 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e72ce91250a [SPARK-44987][SQL] Assign a name to the error class 
`_LEGACY_ERROR_TEMP_1100`
e72ce91250a is described below

commit e72ce91250a9a2c40fd5ed55a50dbc46e4e7e46d
Author: Max Gekk 
AuthorDate: Thu Aug 31 22:50:21 2023 +0300

[SPARK-44987][SQL] Assign a name to the error class 
`_LEGACY_ERROR_TEMP_1100`

### What changes were proposed in this pull request?
In the PR, I propose to assign the name `NON_FOLDABLE_ARGUMENT` to the 
legacy error class `_LEGACY_ERROR_TEMP_1100`, and improve the error message 
format: make it less restrictive.

### Why are the changes needed?
1. To don't confuse users by slightly restrictive error message about 
literals.
2. To assign proper name as a part of activity in SPARK-37935

### Does this PR introduce _any_ user-facing change?
No. Only if user's code depends on error class name and message parameters.

### How was this patch tested?
By running the modified and affected tests:
```
$ build/sbt "test:testOnly *.StringFunctionsSuite"
$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite"
$ build/sbt "core/testOnly *SparkThrowableSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
    No.

Closes #42737 from MaxGekk/assign-name-_LEGACY_ERROR_TEMP_1100.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 11 ---
 docs/sql-error-conditions.md   |  6 
 .../catalyst/expressions/datetimeExpressions.scala |  2 +-
 .../sql/catalyst/expressions/mathExpressions.scala |  4 +--
 .../expressions/numberFormatExpressions.scala  |  2 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  | 14 +
 .../ceil-floor-with-scale-param.sql.out| 36 --
 .../sql-tests/analyzer-results/extract.sql.out | 18 ++-
 .../results/ceil-floor-with-scale-param.sql.out| 36 --
 .../resources/sql-tests/results/extract.sql.out| 18 ++-
 .../apache/spark/sql/StringFunctionsSuite.scala|  8 ++---
 11 files changed, 88 insertions(+), 67 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 3b537cc3d9f..af78dd2f9f8 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2215,6 +2215,12 @@
 ],
 "sqlState" : "42607"
   },
+  "NON_FOLDABLE_ARGUMENT" : {
+"message" : [
+  "The function  requires the parameter  to be a 
foldable expression of the type , but the actual argument is a 
non-foldable."
+],
+"sqlState" : "22024"
+  },
   "NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION" : {
 "message" : [
   "When there are more than one MATCHED clauses in a MERGE statement, only 
the last MATCHED clause can omit the condition."
@@ -4029,11 +4035,6 @@
   "() doesn't support the  mode. Acceptable modes are 
 and ."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1100" : {
-"message" : [
-  "The '' parameter of function '' needs to be a 
 literal."
-]
-  },
   "_LEGACY_ERROR_TEMP_1103" : {
 "message" : [
   "Unsupported component type  in arrays."
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 89c27f72ea0..33072f6c440 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -1305,6 +1305,12 @@ Cannot call function `` because named 
argument references are not
 
 It is not allowed to use an aggregate function in the argument of another 
aggregate function. Please use the inner aggregate function in a sub-query.
 
+### NON_FOLDABLE_ARGUMENT
+
+[SQLSTATE: 22024](sql-error-conditions-sqlstates.html#class-22-data-exception)
+
+The function `` requires the parameter `` to be a 
foldable expression of the type ``, but the actual argument is a 
non-foldable.
+
 ### NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION
 
 [SQLSTATE: 
42613](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 51ddf2b85f8..30a6bec1868 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

[spark] branch master updated (f2a6c97d718 -> d03ebced0ef)

2023-09-04 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f2a6c97d718 [SPARK-44876][PYTHON][FOLLOWUP] Fix Arrow-optimized Python 
UDF to delay wrapping the function with fail_on_stopiteration
 add d03ebced0ef [SPARK-45060][SQL] Fix an internal error from 
`to_char()`on `NULL` format

No new revisions were added by this update.

Summary of changes:
 common/utils/src/main/resources/error/error-classes.json   |  5 +
 ...error-conditions-invalid-parameter-value-error-class.md |  4 
 .../sql/catalyst/expressions/numberFormatExpressions.scala |  8 ++--
 .../apache/spark/sql/errors/QueryCompilationErrors.scala   |  8 
 .../scala/org/apache/spark/sql/StringFunctionsSuite.scala  | 14 ++
 5 files changed, 37 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b0b7835bee2 -> 416207659aa)

2023-09-04 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b0b7835bee2 [SPARK-45059][CONNECT][PYTHON] Add `try_reflect` functions 
to Scala and Python
 add 416207659aa [SPARK-45033][SQL] Support maps by parameterized `sql()`

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/parameters.scala   | 15 --
 .../org/apache/spark/sql/ParametersSuite.scala | 62 +-
 2 files changed, 72 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 637f16e7ff8 [SPARK-45070][SQL][DOCS] Describe the binary and datetime 
formats of `to_char`/`to_varchar`
637f16e7ff8 is described below

commit 637f16e7ff88c2aef0e7f29163e13138ff472c1d
Author: Max Gekk 
AuthorDate: Wed Sep 6 08:25:41 2023 +0300

[SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of 
`to_char`/`to_varchar`

### What changes were proposed in this pull request?
In the PR, I propose to document the recent changes related to the `format` 
of the `to_char`/`to_varchar` functions:
1. binary formats added by https://github.com/apache/spark/pull/42632
2. datetime formats introduced by https://github.com/apache/spark/pull/42534

### Why are the changes needed?
To inform users about recent changes.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By CI.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42801 from MaxGekk/doc-to_char-api.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../main/scala/org/apache/spark/sql/functions.scala| 18 --
 python/pyspark/sql/functions.py| 12 
 .../main/scala/org/apache/spark/sql/functions.scala| 18 ++
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index 527848e95e6..54bf0106956 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4280,6 +4280,7 @@ object functions {
*/
   def to_binary(e: Column): Column = Column.fn("to_binary", e)
 
+  // scalastyle:off line.size.limit
   /**
* Convert `e` to a string based on the `format`. Throws an exception if the 
conversion fails.
*
@@ -4300,13 +4301,20 @@ object functions {
*   (optional, only allowed once at the beginning or end of the format 
string). Note that 'S'
*   prints '+' for positive values but 'MI' prints a space. 'PR': 
Only allowed at the
*   end of the format string; specifies that the result string will be 
wrapped by angle
-   *   brackets if the input value is negative. 
+   *   brackets if the input value is negative.  If `e` is a 
datetime, `format` shall be
+   *   a valid datetime pattern, see https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime
+   *   Patterns. If `e` is a binary, it is converted to a string in one of 
the formats: 
+   *   'base64': a base 64 string. 'hex': a string in the 
hexadecimal format.
+   *   'utf-8': the input binary is decoded to UTF-8 string. 
*
* @group string_funcs
* @since 3.5.0
*/
+  // scalastyle:on line.size.limit
   def to_char(e: Column, format: Column): Column = Column.fn("to_char", e, 
format)
 
+  // scalastyle:off line.size.limit
   /**
* Convert `e` to a string based on the `format`. Throws an exception if the 
conversion fails.
*
@@ -4327,11 +4335,17 @@ object functions {
*   (optional, only allowed once at the beginning or end of the format 
string). Note that 'S'
*   prints '+' for positive values but 'MI' prints a space. 'PR': 
Only allowed at the
*   end of the format string; specifies that the result string will be 
wrapped by angle
-   *   brackets if the input value is negative. 
+   *   brackets if the input value is negative.  If `e` is a 
datetime, `format` shall be
+   *   a valid datetime pattern, see https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime
+   *   Patterns. If `e` is a binary, it is converted to a string in one of 
the formats: 
+   *   'base64': a base 64 string. 'hex': a string in the 
hexadecimal format.
+   *   'utf-8': the input binary is decoded to UTF-8 string. 
*
* @group string_funcs
* @since 3.5.0
*/
+  // scalastyle:on line.size.limit
   def to_varchar(e: Column, format: Column): Column = Column.fn("to_varchar", 
e, format)
 
   /**
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 56b436421af..de91cced206 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -10902,6 +10902,12 @@ def to_char(col: "ColumnOrName", format: 
"ColumnOrName") -> Column:
 values but 'MI' prints a space.
 &

[spark] branch master updated: [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy

2023-09-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 24b29adcf53 [SPARK-45079][SQL] Fix an internal error from 
`percentile_approx()`on `NULL` accuracy
24b29adcf53 is described below

commit 24b29adcf53616067a9fa2ca201e3f4d2f54436b
Author: Max Gekk 
AuthorDate: Wed Sep 6 10:32:37 2023 +0300

[SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on 
`NULL` accuracy

### What changes were proposed in this pull request?
In the PR, I propose to check the `accuracy` argument is not a NULL in 
`ApproximatePercentile`. And if it is, throw an `AnalysisException` with new 
error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`.

### Why are the changes needed?
To fix the issue demonstrated by the example:
```sql
$ spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal 
error. You hit a bug in Spark or the Spark plugins you use. Please, report this 
bug to the corresponding communities or vendors, and provide the full stack 
trace.
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *.ApproximatePercentileQuerySuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42817 from MaxGekk/fix-internal-error-in-percentile_approx.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../aggregate/ApproximatePercentile.scala  |  7 -
 .../sql/ApproximatePercentileQuerySuite.scala  | 31 ++
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
index 3c3afc1c7e7..5b44c3fa31b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
@@ -97,7 +97,8 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue
+  private lazy val accuracyNum = accuracyExpression.eval().asInstanceOf[Number]
+  private lazy val accuracy: Long = accuracyNum.longValue
 
   override def inputTypes: Seq[AbstractDataType] = {
 // Support NumericType, DateType, TimestampType and TimestampNTZType since 
their internal types
@@ -138,6 +139,10 @@ case class ApproximatePercentile(
   "inputExpr" -> toSQLExpr(accuracyExpression)
 )
   )
+} else if (accuracyNum == null) {
+  DataTypeMismatch(
+errorSubClass = "UNEXPECTED_NULL",
+messageParameters = Map("exprName" -> "accuracy"))
 } else if (accuracy <= 0 || accuracy > Int.MaxValue) {
   DataTypeMismatch(
 errorSubClass = "VALUE_OUT_OF_RANGE",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
index 18e8dd6249b..273e8e08fd7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
@@ -339,4 +339,35 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
   Row(Period.ofMonths(200).normalized(), null, 
Duration.ofSeconds(200L)))
 }
   }
+
+  test("SPARK-45079: NULL arguments of percentile_approx") {
+checkError(
+  exception = intercept[AnalysisException] {
+sql(
+  """
+|SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL)
+|FROM VALUES (0), (1), (2), (10) AS tab(col);
+|""".stripMargin).collect()
+  },
+  errorClass = "DATATYPE_MISMATCH.UNEXPECTED_NULL",
+  parameters = Map(
+"exprName" -> "accuracy",
+"sqlExpr" -> "\"percentile_approx(col, array(0.5, 0.4, 0.1), NULL)\""),
+  context = ExpectedContext(
+"", "", 8, 57, "percentile_approx(col, array(0.5, 0.4, 0.1), NULL)"))
+checkError(
+  exception = intercept[AnalysisException] {

[spark] branch branch-3.5 updated: [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy

2023-09-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 9b750e93035 [SPARK-45079][SQL] Fix an internal error from 
`percentile_approx()`on `NULL` accuracy
9b750e93035 is described below

commit 9b750e930357eae092420f09ca9366e49dc589e2
Author: Max Gekk 
AuthorDate: Wed Sep 6 10:32:37 2023 +0300

[SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on 
`NULL` accuracy

### What changes were proposed in this pull request?
In the PR, I propose to check the `accuracy` argument is not a NULL in 
`ApproximatePercentile`. And if it is, throw an `AnalysisException` with new 
error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`.

### Why are the changes needed?
To fix the issue demonstrated by the example:
```sql
$ spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal 
error. You hit a bug in Spark or the Spark plugins you use. Please, report this 
bug to the corresponding communities or vendors, and provide the full stack 
trace.
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *.ApproximatePercentileQuerySuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42817 from MaxGekk/fix-internal-error-in-percentile_approx.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
(cherry picked from commit 24b29adcf53616067a9fa2ca201e3f4d2f54436b)
Signed-off-by: Max Gekk 
---
 .../aggregate/ApproximatePercentile.scala  |  7 -
 .../sql/ApproximatePercentileQuerySuite.scala  | 31 ++
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
index 3c3afc1c7e7..5b44c3fa31b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
@@ -97,7 +97,8 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue
+  private lazy val accuracyNum = accuracyExpression.eval().asInstanceOf[Number]
+  private lazy val accuracy: Long = accuracyNum.longValue
 
   override def inputTypes: Seq[AbstractDataType] = {
 // Support NumericType, DateType, TimestampType and TimestampNTZType since 
their internal types
@@ -138,6 +139,10 @@ case class ApproximatePercentile(
   "inputExpr" -> toSQLExpr(accuracyExpression)
 )
   )
+} else if (accuracyNum == null) {
+  DataTypeMismatch(
+errorSubClass = "UNEXPECTED_NULL",
+messageParameters = Map("exprName" -> "accuracy"))
 } else if (accuracy <= 0 || accuracy > Int.MaxValue) {
   DataTypeMismatch(
 errorSubClass = "VALUE_OUT_OF_RANGE",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
index 18e8dd6249b..273e8e08fd7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
@@ -339,4 +339,35 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
   Row(Period.ofMonths(200).normalized(), null, 
Duration.ofSeconds(200L)))
 }
   }
+
+  test("SPARK-45079: NULL arguments of percentile_approx") {
+checkError(
+  exception = intercept[AnalysisException] {
+sql(
+  """
+|SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL)
+|FROM VALUES (0), (1), (2), (10) AS tab(col);
+|""".stripMargin).collect()
+  },
+  errorClass = "DATATYPE_MISMATCH.UNEXPECTED_NULL",
+  parameters = Map(
+"exprName" -> "accuracy",
+"sqlExpr" -> "\"percentile_approx(col, array(0.5, 0.4, 0.1), NULL)\""),
+  context = ExpectedContext(
+"", "", 8, 57, "percentile_approx(col, array(0.5

[spark] branch branch-3.4 updated: [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy

2023-09-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new f0b421553bc [SPARK-45079][SQL] Fix an internal error from 
`percentile_approx()`on `NULL` accuracy
f0b421553bc is described below

commit f0b421553bc1850cc3e8ed5d564da8f6425cd244
Author: Max Gekk 
AuthorDate: Wed Sep 6 10:32:37 2023 +0300

[SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on 
`NULL` accuracy

### What changes were proposed in this pull request?
In the PR, I propose to check the `accuracy` argument is not a NULL in 
`ApproximatePercentile`. And if it is, throw an `AnalysisException` with new 
error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`.

### Why are the changes needed?
To fix the issue demonstrated by the example:
```sql
$ spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal 
error. You hit a bug in Spark or the Spark plugins you use. Please, report this 
bug to the corresponding communities or vendors, and provide the full stack 
trace.
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *.ApproximatePercentileQuerySuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42817 from MaxGekk/fix-internal-error-in-percentile_approx.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
(cherry picked from commit 24b29adcf53616067a9fa2ca201e3f4d2f54436b)
Signed-off-by: Max Gekk 
---
 .../aggregate/ApproximatePercentile.scala  |  7 -
 .../sql/ApproximatePercentileQuerySuite.scala  | 31 ++
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
index 1499f358ac4..ebf1085c0c1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
@@ -96,7 +96,8 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue
+  private lazy val accuracyNum = accuracyExpression.eval().asInstanceOf[Number]
+  private lazy val accuracy: Long = accuracyNum.longValue
 
   override def inputTypes: Seq[AbstractDataType] = {
 // Support NumericType, DateType, TimestampType and TimestampNTZType since 
their internal types
@@ -137,6 +138,10 @@ case class ApproximatePercentile(
   "inputExpr" -> toSQLExpr(accuracyExpression)
 )
   )
+} else if (accuracyNum == null) {
+  DataTypeMismatch(
+errorSubClass = "UNEXPECTED_NULL",
+messageParameters = Map("exprName" -> "accuracy"))
 } else if (accuracy <= 0 || accuracy > Int.MaxValue) {
   DataTypeMismatch(
 errorSubClass = "VALUE_OUT_OF_RANGE",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
index 9237c9e9486..8598e92f029 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
@@ -337,4 +337,35 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
   Row(Period.ofMonths(200).normalized(), null, 
Duration.ofSeconds(200L)))
 }
   }
+
+  test("SPARK-45079: NULL arguments of percentile_approx") {
+checkError(
+  exception = intercept[AnalysisException] {
+sql(
+  """
+|SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL)
+|FROM VALUES (0), (1), (2), (10) AS tab(col);
+|""".stripMargin).collect()
+  },
+  errorClass = "DATATYPE_MISMATCH.UNEXPECTED_NULL",
+  parameters = Map(
+"exprName" -> "accuracy",
+"sqlExpr" -> "\"percentile_approx(col, array(0.5, 0.4, 0.1), NULL)\""),
+  context = ExpectedContext(
+"", "", 8, 57, "percentile_approx(col, array(0.5

[spark] branch branch-3.3 updated: [SPARK-45079][SQL][3.3] Fix an internal error from `percentile_approx()` on `NULL` accuracy

2023-09-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 5250ed65cf2 [SPARK-45079][SQL][3.3] Fix an internal error from 
`percentile_approx()` on `NULL` accuracy
5250ed65cf2 is described below

commit 5250ed65cf2c70e4b456c96c1006b854f56ef1f2
Author: Max Gekk 
AuthorDate: Wed Sep 6 18:56:14 2023 +0300

[SPARK-45079][SQL][3.3] Fix an internal error from `percentile_approx()` on 
`NULL` accuracy

### What changes were proposed in this pull request?
In the PR, I propose to check the `accuracy` argument is not a NULL in 
`ApproximatePercentile`. And if it is, throw an `AnalysisException` with new 
error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`.

This is a backport of https://github.com/apache/spark/pull/42817.

### Why are the changes needed?
To fix the issue demonstrated by the example:
```sql
$ spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal 
error. You hit a bug in Spark or the Spark plugins you use. Please, report this 
bug to the corresponding communities or vendors, and provide the full stack 
trace.
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *.ApproximatePercentileQuerySuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Authored-by: Max Gekk 
(cherry picked from commit 24b29adcf53616067a9fa2ca201e3f4d2f54436b)

Closes #42835 from MaxGekk/fix-internal-error-in-percentile_approx-3.3.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../expressions/aggregate/ApproximatePercentile.scala |  5 -
 .../spark/sql/ApproximatePercentileQuerySuite.scala   | 19 +++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
index d8eccc075a2..b816e4a9719 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
@@ -95,7 +95,8 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue
+  private lazy val accuracyNum = accuracyExpression.eval().asInstanceOf[Number]
+  private lazy val accuracy: Long = accuracyNum.longValue
 
   override def inputTypes: Seq[AbstractDataType] = {
 // Support NumericType, DateType, TimestampType and TimestampNTZType since 
their internal types
@@ -120,6 +121,8 @@ case class ApproximatePercentile(
   defaultCheck
 } else if (!percentageExpression.foldable || !accuracyExpression.foldable) 
{
   TypeCheckFailure(s"The accuracy or percentage provided must be a 
constant literal")
+} else if (accuracyNum == null) {
+  TypeCheckFailure("Accuracy value must not be null")
 } else if (accuracy <= 0 || accuracy > Int.MaxValue) {
   TypeCheckFailure(s"The accuracy provided must be a literal between (0, 
${Int.MaxValue}]" +
 s" (current value = $accuracy)")
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
index 9237c9e9486..3fd1592a107 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
@@ -337,4 +337,23 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
   Row(Period.ofMonths(200).normalized(), null, 
Duration.ofSeconds(200L)))
 }
   }
+
+  test("SPARK-45079: NULL arguments of percentile_approx") {
+val e1 = intercept[AnalysisException] {
+  sql(
+"""
+  |SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL)
+  |FROM VALUES (0), (1), (2), (10) AS tab(col);
+  |""".stripMargin).collect()
+}
+assert(e1.getMessage.contains("Accuracy value must not be null"))
+val e2 = intercept[AnalysisException] {
+  sql(
+"""
+  |SELECT percentile_approx(col, NULL

[spark] branch master updated (aaf413ce351 -> fd424caf6c4)

2023-09-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from aaf413ce351 [SPARK-44508][PYTHON][DOCS] Add user guide for Python 
user-defined table functions
 add fd424caf6c4 [SPARK-45100][SQL] Fix an internal error from 
`reflect()`on `NULL` class and method

No new revisions were added by this update.

Summary of changes:
 .../expressions/CallMethodViaReflection.scala|  8 
 .../org/apache/spark/sql/MiscFunctionsSuite.scala| 20 
 2 files changed, 28 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45100][SQL] Fix an internal error from `reflect()`on `NULL` class and method

2023-09-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8f730e779ff [SPARK-45100][SQL] Fix an internal error from 
`reflect()`on `NULL` class and method
8f730e779ff is described below

commit 8f730e779ff64773beb20ad633151e866cfff7f2
Author: Max Gekk 
AuthorDate: Fri Sep 8 11:12:54 2023 +0300

[SPARK-45100][SQL] Fix an internal error from `reflect()`on `NULL` class 
and method

### What changes were proposed in this pull request?
In the PR, I propose to check that the `class` and `method` arguments are 
not a NULL in `CallMethodViaReflection`. And if they are, throw an 
`AnalysisException` with new error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`.

### Why are the changes needed?
To fix the issue demonstrated by the example:
```sql
$ spark-sql (default)> select reflect('java.util.UUID', CAST(NULL AS 
STRING));
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal 
error. You hit a bug in Spark or the Spark plugins you use. Please, report this 
bug to the corresponding communities or vendors, and provide the full stack 
trace.
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *.MiscFunctionsSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42849 from MaxGekk/fix-internal-error-in-reflect.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
(cherry picked from commit fd424caf6c46e7030ac2deb2afbe3f4a5fc1095c)
Signed-off-by: Max Gekk 
---
 .../expressions/CallMethodViaReflection.scala|  8 
 .../org/apache/spark/sql/MiscFunctionsSuite.scala| 20 
 2 files changed, 28 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
index 52b057a3276..4511b5b548d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
@@ -78,6 +78,10 @@ case class CallMethodViaReflection(children: Seq[Expression])
   "inputExpr" -> toSQLExpr(children.head)
 )
   )
+case (e, 0) if e.eval() == null =>
+  DataTypeMismatch(
+errorSubClass = "UNEXPECTED_NULL",
+messageParameters = Map("exprName" -> toSQLId("class")))
 case (e, 1) if !(e.dataType == StringType && e.foldable) =>
   DataTypeMismatch(
 errorSubClass = "NON_FOLDABLE_INPUT",
@@ -87,6 +91,10 @@ case class CallMethodViaReflection(children: Seq[Expression])
   "inputExpr" -> toSQLExpr(children(1))
 )
   )
+case (e, 1) if e.eval() == null =>
+  DataTypeMismatch(
+errorSubClass = "UNEXPECTED_NULL",
+messageParameters = Map("exprName" -> toSQLId("method")))
 case (e, idx) if idx > 1 && 
!CallMethodViaReflection.typeMapping.contains(e.dataType) =>
   DataTypeMismatch(
 errorSubClass = "UNEXPECTED_INPUT_TYPE",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
index 074556fa2f9..b890ae73fb6 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
@@ -232,6 +232,26 @@ class MiscFunctionsSuite extends QueryTest with 
SharedSparkSession {
   Seq(Row("a5cf6c42-0c85-418f-af6c-3e4e5b1328f2")))
 checkAnswer(df.select(reflect(lit("java.util.UUID"), lit("fromString"), 
col("a"))),
   Seq(Row("a5cf6c42-0c85-418f-af6c-3e4e5b1328f2")))
+
+checkError(
+  exception = intercept[AnalysisException] {
+df.selectExpr("reflect(cast(null as string), 'fromString', a)")
+  },
+  errorClass = "DATATYPE_MISMATCH.UNEXPECTED_NULL",
+  parameters = Map(
+"exprName" -> "`class`",
+"sqlExpr" -> "\"reflect(CAST(NULL AS STRING), fromString, a)\""),
+  context = ExpectedContext("", "", 0, 45, "reflect(cast(null as string), 
'fromString', a)"))
+checkError(
+  exception = inte

[spark] branch branch-3.3 updated: [SPARK-45100][SQL][3.3] Fix an internal error from `reflect()`on `NULL` class and method

2023-09-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new a4d40e8a355 [SPARK-45100][SQL][3.3] Fix an internal error from 
`reflect()`on `NULL` class and method
a4d40e8a355 is described below

commit a4d40e8a355f451c6340dec0c90a332434433a75
Author: Max Gekk 
AuthorDate: Fri Sep 8 18:59:22 2023 +0300

[SPARK-45100][SQL][3.3] Fix an internal error from `reflect()`on `NULL` 
class and method

### What changes were proposed in this pull request?
In the PR, I propose to check that the `class` and `method` arguments are 
not a NULL in `CallMethodViaReflection`. And if they are, throw an 
`AnalysisException` with new error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`.

This is a backport of https://github.com/apache/spark/pull/42849.

### Why are the changes needed?
To fix the issue demonstrated by the example:
```sql
$ spark-sql (default)> select reflect('java.util.UUID', CAST(NULL AS 
STRING));
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal 
error. You hit a bug in Spark or the Spark plugins you use. Please, report this 
bug to the corresponding communities or vendors, and provide the full stack 
trace.
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *.MiscFunctionsSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Authored-by: Max Gekk 
(cherry picked from commit fd424caf6c46e7030ac2deb2afbe3f4a5fc1095c)

Closes #42856 from MaxGekk/fix-internal-error-in-reflect-3.3.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/expressions/CallMethodViaReflection.scala  | 2 ++
 .../src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala  | 8 
 2 files changed, 10 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
index 7cb830d1156..9764d9db7f0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
@@ -65,6 +65,8 @@ case class CallMethodViaReflection(children: Seq[Expression])
 } else if (!children.take(2).forall(e => e.dataType == StringType && 
e.foldable)) {
   // The first two arguments must be string type.
   TypeCheckFailure("first two arguments should be string literals")
+} else if (children.take(2).exists(_.eval() == null)) {
+  TypeCheckFailure("first two arguments must be non-NULL")
 } else if (!classExists) {
   TypeCheckFailure(s"class $className not found")
 } else if (children.slice(2, children.length)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
index 37ba52023dd..18262ccd407 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
@@ -34,6 +34,14 @@ class MiscFunctionsSuite extends QueryTest with 
SharedSparkSession {
 s"reflect('$className', 'method1', a, b)",
 s"java_method('$className', 'method1', a, b)"),
   Row("m1one", "m1one"))
+val e1 = intercept[AnalysisException] {
+  df.selectExpr("reflect(cast(null as string), 'fromString', a)")
+}
+assert(e1.getMessage.contains("first two arguments must be non-NULL"))
+val e2 = intercept[AnalysisException] {
+  df.selectExpr("reflect('java.util.UUID', cast(null as string), a)")
+}
+assert(e2.getMessage.contains("first two arguments must be non-NULL"))
   }
 
   test("version") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43251][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2015` with an internal error

2023-09-11 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c7ea3f7d53d [SPARK-43251][SQL] Replace the error class 
`_LEGACY_ERROR_TEMP_2015` with an internal error
c7ea3f7d53d is described below

commit c7ea3f7d53d5a7674f3da0db07018c1f0c43dbf6
Author: dengziming 
AuthorDate: Mon Sep 11 18:28:31 2023 +0300

[SPARK-43251][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2015` with 
an internal error

### What changes were proposed in this pull request?
Replace the legacy error class `_LEGACY_ERROR_TEMP_2015` with an internal 
error as it is not triggered by the user space.

### Why are the changes needed?
As the error is not triggered by the user space, the legacy error class can 
be replaced by an internal error.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing test cases.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42845 from dengziming/SPARK-43251.

Authored-by: dengziming 
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/error-classes.json | 5 -
 .../scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 9 +++--
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 2954d8b9338..282af8c199d 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -4944,11 +4944,6 @@
   "Negative values found in "
 ]
   },
-  "_LEGACY_ERROR_TEMP_2015" : {
-"message" : [
-  "Cannot generate  code for incomparable type: ."
-]
-  },
   "_LEGACY_ERROR_TEMP_2016" : {
 "message" : [
   "Can not interpolate  into code block."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 2d655be0e70..417ba38c66f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -405,12 +405,9 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
   }
 
   def cannotGenerateCodeForIncomparableTypeError(
-  codeType: String, dataType: DataType): SparkIllegalArgumentException = {
-new SparkIllegalArgumentException(
-  errorClass = "_LEGACY_ERROR_TEMP_2015",
-  messageParameters = Map(
-"codeType" -> codeType,
-"dataType" -> dataType.catalogString))
+  codeType: String, dataType: DataType): Throwable = {
+SparkException.internalError(
+  s"Cannot generate $codeType code for incomparable type: 
${toSQLType(dataType)}.")
   }
 
   def cannotInterpolateClassIntoCodeBlockError(arg: Any): 
SparkIllegalArgumentException = {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fa2bc21ba1e -> 6565ae47cae)

2023-09-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fa2bc21ba1e [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3
 add 6565ae47cae [SPARK-43252][SQL] Replace the error class 
`_LEGACY_ERROR_TEMP_2016` with an internal error

No new revisions were added by this update.

Summary of changes:
 common/utils/src/main/resources/error/error-classes.json|  5 -
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala  |  6 ++
 .../sql/catalyst/expressions/codegen/CodeBlockSuite.scala   | 13 -
 3 files changed, 10 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6565ae47cae -> d8129f837c4)

2023-09-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6565ae47cae [SPARK-43252][SQL] Replace the error class 
`_LEGACY_ERROR_TEMP_2016` with an internal error
 add d8129f837c4 [SPARK-45085][SQL] Merge UNSUPPORTED_TEMP_VIEW_OPERATION 
into UNSUPPORTED_VIEW_OPERATION and refactor some logic

No new revisions were added by this update.

Summary of changes:
 R/pkg/tests/fulltests/test_sparkSQL.R  |  2 +-
 .../src/main/resources/error/error-classes.json| 17 --
 docs/sql-error-conditions.md   |  8 ---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 19 +++
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  |  4 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  | 52 +++--
 .../analyzer-results/change-column.sql.out |  8 +--
 .../sql-tests/results/change-column.sql.out|  8 +--
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  4 +-
 .../apache/spark/sql/execution/SQLViewSuite.scala  | 66 +++---
 .../spark/sql/execution/SQLViewTestSuite.scala |  4 +-
 .../spark/sql/execution/command/DDLSuite.scala |  6 +-
 .../execution/command/TruncateTableSuiteBase.scala | 10 ++--
 .../execution/command/v1/ShowPartitionsSuite.scala | 10 ++--
 .../apache/spark/sql/internal/CatalogSuite.scala   |  4 +-
 15 files changed, 80 insertions(+), 142 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44911][SQL] Create hive table with invalid column should return error class

2023-09-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1e03db36a93 [SPARK-44911][SQL] Create hive table with invalid column 
should return error class
1e03db36a93 is described below

commit 1e03db36a939aea5b4d55059967ccde96cb29564
Author: ming95 <505306...@qq.com>
AuthorDate: Tue Sep 12 11:55:08 2023 +0300

[SPARK-44911][SQL] Create hive table with invalid column should return 
error class

### What changes were proposed in this pull request?

create hive table with invalid column should return error class.

run sql
```
create table test stored as parquet as select id, date'2018-01-01' + 
make_dt_interval(0, id)  from range(0, 10)
```

before this issue , error would be :

```
org.apache.spark.sql.AnalysisException: Cannot create a table having a 
column whose name contains commas in Hive metastore. Table: 
`spark_catalog`.`default`.`test`; Column: DATE '2018-01-01' + 
make_dt_interval(0, id, 0, 0.00)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4(HiveExternalCatalog.scala:175)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4$adapted(HiveExternalCatalog.scala:171)
at scala.collection.Iterator.foreach(Iterator.scala:943)
```

after this issue
```
Exception in thread "main" org.apache.spark.sql.AnalysisException: 
[INVALID_HIVE_COLUMN_NAME] Cannot create the table 
`spark_catalog`.`default`.`parquet_ds1` having the column `DATE '2018-01-01' + 
make_dt_interval(0, id, 0, 0`.`00)` whose name contains invalid characters 
',' in Hive metastore.
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4(HiveExternalCatalog.scala:180)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4$adapted(HiveExternalCatalog.scala:171)
at scala.collection.Iterator.foreach(Iterator.scala:943)
```

### Why are the changes needed?

as above

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

add UT

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42609 from ming95/SPARK-44911.

Authored-by: ming95 <505306...@qq.com>
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  2 +-
 docs/sql-error-conditions.md   |  2 +-
 .../spark/sql/hive/HiveExternalCatalog.scala   | 11 ---
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 21 
 .../spark/sql/hive/execution/SQLQuerySuite.scala   | 23 +++---
 5 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 415bdbaf42a..4740ed72f89 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1587,7 +1587,7 @@
   },
   "INVALID_HIVE_COLUMN_NAME" : {
 "message" : [
-  "Cannot create the table  having the nested column 
 whose name contains invalid characters  in Hive 
metastore."
+  "Cannot create the table  having the column  
whose name contains invalid characters  in Hive metastore."
 ]
   },
   "INVALID_IDENTIFIER" : {
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 0d54938593c..444c2b7c0d1 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -971,7 +971,7 @@ For more details see 
[INVALID_HANDLE](sql-error-conditions-invalid-handle-error-
 
 SQLSTATE: none assigned
 
-Cannot create the table `` having the nested column `` 
whose name contains invalid characters `` in Hive metastore.
+Cannot create the table `` having the column `` whose 
name contains invalid characters `` in Hive metastore.
 
 ### INVALID_IDENTIFIER
 
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index e4325989b70..67292460bbc 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -42,7 +42,7 @@ import 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils._
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.types.DataTypeUtils
 import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, 
CharVarcharUtils}
-import or

[spark] branch master updated: [SPARK-45162][SQL] Support maps and array parameters constructed via `call_function`

2023-09-14 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cd672b09ac6 [SPARK-45162][SQL] Support maps and array parameters 
constructed via `call_function`
cd672b09ac6 is described below

commit cd672b09ac69724cd99dc12c9bb49dd117025be1
Author: Max Gekk 
AuthorDate: Thu Sep 14 11:31:56 2023 +0300

[SPARK-45162][SQL] Support maps and array parameters constructed via 
`call_function`

### What changes were proposed in this pull request?
In the PR, I propose to move the `BindParameters` rules from the 
`Substitution` to the `Resolution` batch, and change types of the `args` 
parameter of `NameParameterizedQuery` and `PosParameterizedQuery` to an 
`Iterable` to resolve argument expressions.

### Why are the changes needed?
After the PR, the parameterized `sql()` allows map/array/struct constructed 
by functions like `map()`, `array()`, and `struct()`, but the same functions 
invoked via `call_function` are not supported:
```scala
scala> sql("SELECT element_at(:mapParam, 'a')", Map("mapParam" -> 
call_function("map", lit("a"), lit(1
org.apache.spark.sql.catalyst.ExtendedAnalysisException: 
[UNBOUND_SQL_PARAMETER] Found the unbound parameter: mapParam. Please, fix 
`args` and provide a mapping of the parameter to a SQL literal.; line 1 pos 18;
```

### Does this PR introduce _any_ user-facing change?
No, should not since it fixes an issue. Only if user code depends on the 
error message.

After the changes:
```scala
scala> sql("SELECT element_at(:mapParam, 'a')", Map("mapParam" -> 
call_function("map", lit("a"), lit(1.show(false)
++
|element_at(map(a, 1), a)|
++
|1   |
++
```

### How was this patch tested?
By running new tests:
```
$ build/sbt "test:testOnly *ParametersSuite"
    ```
    
### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42894 from MaxGekk/fix-parameterized-sql-unresolved.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../sql/connect/planner/SparkConnectPlanner.scala  |  2 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala |  2 +-
 .../spark/sql/catalyst/analysis/parameters.scala   | 28 +-
 .../sql/catalyst/analysis/AnalysisSuite.scala  |  4 ++--
 .../org/apache/spark/sql/ParametersSuite.scala | 19 ---
 5 files changed, 42 insertions(+), 13 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 24dee006f0b..74a8ff290eb 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -269,7 +269,7 @@ class SparkConnectPlanner(val sessionHolder: SessionHolder) 
extends Logging {
 if (!args.isEmpty) {
   NameParameterizedQuery(parsedPlan, 
args.asScala.mapValues(transformLiteral).toMap)
 } else if (!posArgs.isEmpty) {
-  PosParameterizedQuery(parsedPlan, 
posArgs.asScala.map(transformLiteral).toArray)
+  PosParameterizedQuery(parsedPlan, 
posArgs.asScala.map(transformLiteral).toSeq)
 } else {
   parsedPlan
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index e15b9730111..6491a4eea95 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -260,7 +260,6 @@ class Analyzer(override val catalogManager: CatalogManager) 
extends RuleExecutor
   // at the beginning of analysis.
   OptimizeUpdateFields,
   CTESubstitution,
-  BindParameters,
   WindowsSubstitution,
   EliminateUnions,
   SubstituteUnresolvedOrdinals),
@@ -322,6 +321,7 @@ class Analyzer(override val catalogManager: CatalogManager) 
extends RuleExecutor
   RewriteDeleteFromTable ::
   RewriteUpdateTable ::
   RewriteMergeIntoTable ::
+  BindParameters ::
   typeCoercionRules ++
   Seq(
 ResolveWithCTE,
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala
index 13404797490

[spark] branch master updated (cd672b09ac6 -> 6653f94d489)

2023-09-14 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cd672b09ac6 [SPARK-45162][SQL] Support maps and array parameters 
constructed via `call_function`
 add 6653f94d489 [SPARK-45156][SQL] Wrap `inputName` by backticks in the 
`NON_FOLDABLE_INPUT` error class

No new revisions were added by this update.

Summary of changes:
 .../expressions/CallMethodViaReflection.scala |  4 ++--
 .../aggregate/ApproxCountDistinctForIntervals.scala   |  2 +-
 .../expressions/aggregate/ApproximatePercentile.scala |  4 ++--
 .../expressions/aggregate/BloomFilterAggregate.scala  |  4 ++--
 .../expressions/aggregate/CountMinSketchAgg.scala |  6 +++---
 .../expressions/aggregate/HistogramNumeric.scala  |  2 +-
 .../catalyst/expressions/aggregate/percentiles.scala  |  2 +-
 .../sql/catalyst/expressions/csvExpressions.scala |  2 +-
 .../spark/sql/catalyst/expressions/generators.scala   |  2 +-
 .../sql/catalyst/expressions/jsonExpressions.scala|  2 +-
 .../sql/catalyst/expressions/maskExpressions.scala|  2 +-
 .../sql/catalyst/expressions/mathExpressions.scala|  2 +-
 .../sql/catalyst/expressions/regexpExpressions.scala  |  2 +-
 .../sql/catalyst/expressions/stringExpressions.scala  |  2 +-
 .../sql/catalyst/expressions/windowExpressions.scala  | 19 +++
 .../spark/sql/catalyst/expressions/xml/xpath.scala|  2 +-
 .../sql/catalyst/expressions/xmlExpressions.scala |  2 +-
 .../analysis/ExpressionTypeCheckingSuite.scala|  6 +++---
 .../expressions/CallMethodViaReflectionSuite.scala|  2 +-
 .../catalyst/expressions/RegexpExpressionsSuite.scala |  2 +-
 .../catalyst/expressions/StringExpressionsSuite.scala |  6 +++---
 .../ApproxCountDistinctForIntervalsSuite.scala|  2 +-
 .../aggregate/ApproximatePercentileSuite.scala|  4 ++--
 .../aggregate/CountMinSketchAggSuite.scala|  6 +++---
 .../expressions/aggregate/HistogramNumericSuite.scala |  2 +-
 .../expressions/aggregate/PercentileSuite.scala   |  2 +-
 .../expressions/xml/XPathExpressionSuite.scala|  2 +-
 .../analyzer-results/ansi/string-functions.sql.out|  2 +-
 .../sql-tests/analyzer-results/csv-functions.sql.out  |  2 +-
 .../sql-tests/analyzer-results/join-lateral.sql.out   |  2 +-
 .../sql-tests/analyzer-results/json-functions.sql.out |  2 +-
 .../sql-tests/analyzer-results/mask-functions.sql.out |  4 ++--
 .../sql-tests/analyzer-results/percentiles.sql.out|  2 +-
 .../analyzer-results/string-functions.sql.out |  2 +-
 .../sql-tests/results/ansi/string-functions.sql.out   |  2 +-
 .../resources/sql-tests/results/csv-functions.sql.out |  2 +-
 .../resources/sql-tests/results/join-lateral.sql.out  |  2 +-
 .../sql-tests/results/json-functions.sql.out  |  2 +-
 .../sql-tests/results/mask-functions.sql.out  |  4 ++--
 .../resources/sql-tests/results/percentiles.sql.out   |  2 +-
 .../sql-tests/results/string-functions.sql.out|  2 +-
 .../spark/sql/DataFrameWindowFunctionsSuite.scala |  2 +-
 .../org/apache/spark/sql/GeneratorFunctionSuite.scala |  2 +-
 43 files changed, 67 insertions(+), 64 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e84c66db60c [SPARK-45078][SQL] Fix `array_insert` 
ImplicitCastInputTypes not work
e84c66db60c is described below

commit e84c66db60c78476806161479344cd32a7606ab1
Author: Jia Fan 
AuthorDate: Sun Sep 17 11:16:24 2023 +0300

[SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

### What changes were proposed in this pull request?
This PR fix call `array_insert` with different type between array and 
insert column, will throw exception. Sometimes it should be execute successed.
eg:
```sql
select array_insert(array(1), 2, cast(2 as tinyint))
```
The `ImplicitCastInputTypes` in `ArrayInsert` always return empty array at 
now. So that Spark can not convert `tinyint` to `int`.

### Why are the changes needed?
Fix error behavior in `array_insert`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42951 from Hisoka-X/SPARK-45078_arrayinsert_type_mismatch.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/expressions/collectionOperations.scala | 1 -
 .../test/resources/sql-tests/analyzer-results/ansi/array.sql.out  | 7 +++
 .../src/test/resources/sql-tests/analyzer-results/array.sql.out   | 7 +++
 sql/core/src/test/resources/sql-tests/inputs/array.sql| 1 +
 sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out  | 8 
 sql/core/src/test/resources/sql-tests/results/array.sql.out   | 8 
 6 files changed, 31 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 957aa1ab2d5..9c9127efb17 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -4749,7 +4749,6 @@ case class ArrayInsert(
 }
   case (e1, e2, e3) => Seq.empty
 }
-Seq.empty
   }
 
   override def checkInputDataTypes(): TypeCheckResult = {
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out
index cd101c7a524..6fc30815793 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out
@@ -531,6 +531,13 @@ Project [array_insert(array(2, 3, cast(null as int), 4), 
-5, 1, false) AS array_
 +- OneRowRelation
 
 
+-- !query
+select array_insert(array(1), 2, cast(2 as tinyint))
+-- !query analysis
+Project [array_insert(array(1), 2, cast(cast(2 as tinyint) as int), false) AS 
array_insert(array(1), 2, CAST(2 AS TINYINT))#x]
++- OneRowRelation
+
+
 -- !query
 set spark.sql.legacy.negativeIndexInArrayInsert=true
 -- !query analysis
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out
index 8279fb3362e..e0585b77cb6 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out
@@ -531,6 +531,13 @@ Project [array_insert(array(2, 3, cast(null as int), 4), 
-5, 1, false) AS array_
 +- OneRowRelation
 
 
+-- !query
+select array_insert(array(1), 2, cast(2 as tinyint))
+-- !query analysis
+Project [array_insert(array(1), 2, cast(cast(2 as tinyint) as int), false) AS 
array_insert(array(1), 2, CAST(2 AS TINYINT))#x]
++- OneRowRelation
+
+
 -- !query
 set spark.sql.legacy.negativeIndexInArrayInsert=true
 -- !query analysis
diff --git a/sql/core/src/test/resources/sql-tests/inputs/array.sql 
b/sql/core/src/test/resources/sql-tests/inputs/array.sql
index 48edc6b4742..52a0906ea73 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/array.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/array.sql
@@ -141,6 +141,7 @@ select array_insert(array(1, 2, 3, NULL), cast(NULL as 
INT), 4);
 select array_insert(array(1, 2, 3, NULL), 4, cast(NULL as INT));
 select array_insert(array(2, 3, NULL, 4), 5, 5);
 select array_insert(array(2, 3, NULL, 4), -5, 1);
+select array_insert(array(1), 2, cast(2 as tinyint));
 
 set spark.sql.legacy.negativeIndexInArrayInsert=true;
 select array_insert(array(1, 3, 4), -2, 2);
diff --git a/sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out 
b/sql/core/src/test/resources/sql-te

[spark] branch branch-3.5 updated: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 723a85eb2df [SPARK-45078][SQL] Fix `array_insert` 
ImplicitCastInputTypes not work
723a85eb2df is described below

commit 723a85eb2dffa69571cba841380eb759a9b89321
Author: Jia Fan 
AuthorDate: Sun Sep 17 11:16:24 2023 +0300

[SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

### What changes were proposed in this pull request?
This PR fix call `array_insert` with different type between array and 
insert column, will throw exception. Sometimes it should be execute successed.
eg:
```sql
select array_insert(array(1), 2, cast(2 as tinyint))
```
The `ImplicitCastInputTypes` in `ArrayInsert` always return empty array at 
now. So that Spark can not convert `tinyint` to `int`.

### Why are the changes needed?
Fix error behavior in `array_insert`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42951 from Hisoka-X/SPARK-45078_arrayinsert_type_mismatch.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
(cherry picked from commit e84c66db60c78476806161479344cd32a7606ab1)
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/expressions/collectionOperations.scala | 1 -
 .../test/resources/sql-tests/analyzer-results/ansi/array.sql.out  | 7 +++
 .../src/test/resources/sql-tests/analyzer-results/array.sql.out   | 7 +++
 sql/core/src/test/resources/sql-tests/inputs/array.sql| 1 +
 sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out  | 8 
 sql/core/src/test/resources/sql-tests/results/array.sql.out   | 8 
 6 files changed, 31 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index fe9c4015c15..ade4a6c5be7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -4711,7 +4711,6 @@ case class ArrayInsert(
 }
   case (e1, e2, e3) => Seq.empty
 }
-Seq.empty
   }
 
   override def checkInputDataTypes(): TypeCheckResult = {
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out
index cd101c7a524..6fc30815793 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out
@@ -531,6 +531,13 @@ Project [array_insert(array(2, 3, cast(null as int), 4), 
-5, 1, false) AS array_
 +- OneRowRelation
 
 
+-- !query
+select array_insert(array(1), 2, cast(2 as tinyint))
+-- !query analysis
+Project [array_insert(array(1), 2, cast(cast(2 as tinyint) as int), false) AS 
array_insert(array(1), 2, CAST(2 AS TINYINT))#x]
++- OneRowRelation
+
+
 -- !query
 set spark.sql.legacy.negativeIndexInArrayInsert=true
 -- !query analysis
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out
index 8279fb3362e..e0585b77cb6 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out
@@ -531,6 +531,13 @@ Project [array_insert(array(2, 3, cast(null as int), 4), 
-5, 1, false) AS array_
 +- OneRowRelation
 
 
+-- !query
+select array_insert(array(1), 2, cast(2 as tinyint))
+-- !query analysis
+Project [array_insert(array(1), 2, cast(cast(2 as tinyint) as int), false) AS 
array_insert(array(1), 2, CAST(2 AS TINYINT))#x]
++- OneRowRelation
+
+
 -- !query
 set spark.sql.legacy.negativeIndexInArrayInsert=true
 -- !query analysis
diff --git a/sql/core/src/test/resources/sql-tests/inputs/array.sql 
b/sql/core/src/test/resources/sql-tests/inputs/array.sql
index 48edc6b4742..52a0906ea73 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/array.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/array.sql
@@ -141,6 +141,7 @@ select array_insert(array(1, 2, 3, NULL), cast(NULL as 
INT), 4);
 select array_insert(array(1, 2, 3, NULL), 4, cast(NULL as INT));
 select array_insert(array(2, 3, NULL, 4), 5, 5);
 select array_insert(array(2, 3, NULL, 4), -5, 1);
+select array_insert(array(1), 2, cast(2 as tinyint));
 
 set spark.sql.legacy.negativeIndexInArrayInsert=true;
 select array_insert(array(1, 3, 4), -2, 2);
diff --gi

[spark] branch master updated: [SPARK-45034][SQL] Support deterministic mode function

2023-09-17 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f5365d0dc59 [SPARK-45034][SQL] Support deterministic mode function
f5365d0dc59 is described below

commit f5365d0dc590d4965a269da223dbd72fbb764595
Author: Peter Toth 
AuthorDate: Sun Sep 17 21:37:57 2023 +0300

[SPARK-45034][SQL] Support deterministic mode function

### What changes were proposed in this pull request?
This PR adds a new optional argument to the `mode` aggregate function to 
provide deterministic results. When multiple values have the same greatest 
frequency then the new boolean argument can be used to get the lowest or 
highest value instead of an arbitraty one.

### Why are the changes needed?
To make the function more user friendly.

### Does this PR introduce _any_ user-facing change?
Yes, it adds a new argument to the `mode` function.

### How was this patch tested?
Added new UTs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42755 from peter-toth/SPARK-45034-deterministic-mode-function.

Authored-by: Peter Toth 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/sql/functions.scala |  14 ++-
 .../explain-results/function_mode.explain  |   2 +-
 .../query-tests/queries/function_mode.json |   4 +
 .../query-tests/queries/function_mode.proto.bin| Bin 173 -> 179 bytes
 python/pyspark/sql/connect/functions.py|   4 +-
 python/pyspark/sql/functions.py|  35 --
 .../sql/catalyst/expressions/aggregate/Mode.scala  |  76 ++--
 .../scala/org/apache/spark/sql/functions.scala |  16 ++-
 .../sql-functions/sql-expression-schema.md |   2 +-
 .../sql-tests/analyzer-results/group-by.sql.out| 120 ++-
 .../test/resources/sql-tests/inputs/group-by.sql   |  11 ++
 .../resources/sql-tests/results/group-by.sql.out   | 132 -
 .../apache/spark/sql/DatasetAggregatorSuite.scala  |  10 ++
 13 files changed, 397 insertions(+), 29 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index b2102d4ba55..83f0ee64501 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -827,7 +827,19 @@ object functions {
* @group agg_funcs
* @since 3.4.0
*/
-  def mode(e: Column): Column = Column.fn("mode", e)
+  def mode(e: Column): Column = mode(e, deterministic = false)
+
+  /**
+   * Aggregate function: returns the most frequent value in a group.
+   *
+   * When multiple values have the same greatest frequency then either any of 
values is returned
+   * if deterministic is false or is not defined, or the lowest value is 
returned if deterministic
+   * is true.
+   *
+   * @group agg_funcs
+   * @since 4.0.0
+   */
+  def mode(e: Column, deterministic: Boolean): Column = Column.fn("mode", e, 
lit(deterministic))
 
   /**
* Aggregate function: returns the maximum value of the expression in a 
group.
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_mode.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_mode.explain
index dfa2113a2c3..28bbb44b0fd 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_mode.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_mode.explain
@@ -1,2 +1,2 @@
-Aggregate [mode(a#0, 0, 0) AS mode(a)#0]
+Aggregate [mode(a#0, 0, 0, false) AS mode(a, false)#0]
 +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_mode.json
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_mode.json
index 8e8183e9e08..5c26edee803 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/queries/function_mode.json
+++ 
b/connector/connect/common/src/test/resources/query-tests/queries/function_mode.json
@@ -18,6 +18,10 @@
   "unresolvedAttribute": {
 "unparsedIdentifier": "a"
   }
+}, {
+  "literal": {
+"boolean": false
+  }
 }]
   }
 }]
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_mode.proto.bin
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_mode.proto.bin
index dca0953a387..cc115e43172 100644
Binary files 
a/connector/connect/comm

[spark] branch master updated (8d363c6e2c8 -> 0dda75f824d)

2023-09-18 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8d363c6e2c8 [SPARK-45196][PYTHON][DOCS] Refine docstring of 
`array/array_contains/arrays_overlap`
 add 0dda75f824d [SPARK-45137][CONNECT] Support map/array parameters in 
parameterized `sql()`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/SparkSession.scala  |   6 +-
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  |   7 +
 .../src/main/protobuf/spark/connect/commands.proto |  12 +-
 .../main/protobuf/spark/connect/relations.proto|  12 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  26 +-
 python/pyspark/sql/connect/proto/commands_pb2.py   | 164 +++--
 python/pyspark/sql/connect/proto/commands_pb2.pyi  |  60 -
 python/pyspark/sql/connect/proto/relations_pb2.py  | 268 +++--
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  60 -
 9 files changed, 396 insertions(+), 219 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45188][SQL][DOCS] Update error messages related to parameterized `sql()`

2023-09-18 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 981312284f0 [SPARK-45188][SQL][DOCS] Update error messages related to 
parameterized `sql()`
981312284f0 is described below

commit 981312284f0776ca847c8d21411f74a72c639b22
Author: Max Gekk 
AuthorDate: Tue Sep 19 00:22:43 2023 +0300

[SPARK-45188][SQL][DOCS] Update error messages related to parameterized 
`sql()`

### What changes were proposed in this pull request?
In the PR, I propose to update some error formats and comments regarding 
`sql()` parameters - maps, arrays and struct might be used as `sql()` 
parameters. New behaviour has been added by 
https://github.com/apache/spark/pull/42752.

### Why are the changes needed?
To inform users about recent changes introduced by SPARK-45033.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the affected test suite:
```
$ build/sbt "core/testOnly *SparkThrowableSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42957 from MaxGekk/clean-ClientE2ETestSuite.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json  |  4 ++--
 .../scala/org/apache/spark/sql/SparkSession.scala| 11 +++
 docs/sql-error-conditions.md |  4 ++--
 python/pyspark/pandas/sql_formatter.py   |  3 ++-
 python/pyspark/sql/session.py|  3 ++-
 .../spark/sql/catalyst/analysis/parameters.scala | 14 +-
 .../scala/org/apache/spark/sql/SparkSession.scala| 20 ++--
 7 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 4740ed72f89..186e7b4640d 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1892,7 +1892,7 @@
   },
   "INVALID_SQL_ARG" : {
 "message" : [
-  "The argument  of `sql()` is invalid. Consider to replace it by a 
SQL literal."
+  "The argument  of `sql()` is invalid. Consider to replace it 
either by a SQL literal or by collection constructor functions such as `map()`, 
`array()`, `struct()`."
 ]
   },
   "INVALID_SQL_SYNTAX" : {
@@ -2768,7 +2768,7 @@
   },
   "UNBOUND_SQL_PARAMETER" : {
 "message" : [
-  "Found the unbound parameter: . Please, fix `args` and provide a 
mapping of the parameter to a SQL literal."
+  "Found the unbound parameter: . Please, fix `args` and provide a 
mapping of the parameter to either a SQL literal or collection constructor 
functions such as `map()`, `array()`, `struct()`."
 ],
 "sqlState" : "42P02"
   },
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
index 8788e34893e..5aa8c5a2bd5 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -235,8 +235,9 @@ class SparkSession private[sql] (
*   An array of Java/Scala objects that can be converted to SQL literal 
expressions. See https://spark.apache.org/docs/latest/sql-ref-datatypes.html";> 
Supported Data
*   Types for supported value types in Scala/Java. For example: 1, 
"Steven",
-   *   LocalDate.of(2023, 4, 2). A value can be also a `Column` of literal 
expression, in that
-   *   case it is taken as is.
+   *   LocalDate.of(2023, 4, 2). A value can be also a `Column` of a literal 
or collection
+   *   constructor functions such as `map()`, `array()`, `struct()`, in that 
case it is taken as
+   *   is.
*
* @since 3.5.0
*/
@@ -272,7 +273,8 @@ class SparkSession private[sql] (
*   expressions. See https://spark.apache.org/docs/latest/sql-ref-datatypes.html";>
*   Supported Data Types for supported value types in Scala/Java. For 
example, map keys:
*   "rank", "name", "birthdate"; map values: 1, "Steven", 
LocalDate.of(2023, 4, 2). Map value
-   *   can be also a `Column` of literal expression, in that case it is taken 
as is.
+   *   can be also a `Column` of a literal or collection constructor functions 
such as `map()`,
+   *   `array()`, `struct()`, in that case it is taken as is.
*
* @since 3.4.0
*/
@@ -292,7 +294,8 @@ class SparkSession private[sql] (
*   e

[spark] branch master updated: [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()`

2023-09-20 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c89221b02bb3 [SPARK-45224][PYTHON] Add examples w/ map and array as 
parameters of `sql()`
c89221b02bb3 is described below

commit c89221b02bb3000f707a31322e6d40b561e527bd
Author: Max Gekk 
AuthorDate: Wed Sep 20 11:09:01 2023 +0300

[SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()`

### What changes were proposed in this pull request?
In the PR, I propose to add a few more examples for the `sql()` method in 
PySpark API with array and map parameters.

### Why are the changes needed?
To inform users about recent changes introduced by #42752 and #42470, and 
check the changes work actually.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new examples:
```
$ python/run-tests --parallelism=1 --testnames 'pyspark.sql.session 
SparkSession.sql'
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42996 from MaxGekk/map-sql-parameterized-python-connect.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 python/pyspark/sql/session.py | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index dc4f8f321a59..de2e8d0cda2a 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -1599,23 +1599,27 @@ class SparkSession(SparkConversionMixin):
 
 And substitude named parameters with the `:` prefix by SQL literals.
 
->>> spark.sql("SELECT * FROM {df} WHERE {df[B]} > :minB", {"minB" : 
5}, df=mydf).show()
-+---+---+
-|  A|  B|
-+---+---+
-|  3|  6|
-+---+---+
+>>> from pyspark.sql.functions import create_map
+>>> spark.sql(
+...   "SELECT *, element_at(:m, 'a') AS C FROM {df} WHERE {df[B]} > 
:minB",
+...   {"minB" : 5, "m" : create_map(lit('a'), lit(1))}, df=mydf).show()
++---+---+---+
+|  A|  B|  C|
++---+---+---+
+|  3|  6|  1|
++---+---+---+
 
 Or positional parameters marked by `?` in the SQL query by SQL 
literals.
 
+>>> from pyspark.sql.functions import array
 >>> spark.sql(
-...   "SELECT * FROM {df} WHERE {df[B]} > ? and ? < {df[A]}",
-...   args=[5, 2], df=mydf).show()
-+---+---+
-|  A|  B|
-+---+---+
-|  3|  6|
-+---+---+
+...   "SELECT *, element_at(?, 1) AS C FROM {df} WHERE {df[B]} > ? and 
? < {df[A]}",
+...   args=[array(lit(1), lit(2), lit(3)), 5, 2], df=mydf).show()
++---+---+---+
+|  A|  B|  C|
++---+---+---+
+|  3|  6|  1|
++---+---+---+
 """
 
 formatter = SQLStringFormatter(self)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a2bab5efc5b [SPARK-45235][CONNECT][PYTHON] Support map and array 
parameters by `sql()`
a2bab5efc5b is described below

commit a2bab5efc5b5f0e841e9b34ccbfd2cb99af5923e
Author: Max Gekk 
AuthorDate: Thu Sep 21 09:05:30 2023 +0300

[SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

### What changes were proposed in this pull request?
In the PR, I propose to change the Python connect client to support 
`Column` as a parameter of `sql()`.

### Why are the changes needed?
To achieve feature parity w/ regular PySpark which supports map and arrays 
as parameters of `sql()`, see https://github.com/apache/spark/pull/42996.

### Does this PR introduce _any_ user-facing change?
No. It fixes a bug.

### How was this patch tested?
By running the modified tests:
```
$ python/run-tests --parallelism=1 --testnames 
'pyspark.sql.tests.connect.test_connect_basic 
SparkConnectBasicTests.test_sql_with_named_args'
$ python/run-tests --parallelism=1 --testnames 
'pyspark.sql.tests.connect.test_connect_basic 
SparkConnectBasicTests.test_sql_with_pos_args'
```

### Was this patch authored or co-authored using generative AI tooling?
No.

    Closes #43014 from MaxGekk/map-sql-parameterized-python-connect-2.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 python/pyspark/sql/connect/plan.py | 22 ++
 python/pyspark/sql/connect/session.py  |  2 +-
 .../sql/tests/connect/test_connect_basic.py| 12 
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/python/pyspark/sql/connect/plan.py 
b/python/pyspark/sql/connect/plan.py
index 3e8db2aae09..d069081e1af 100644
--- a/python/pyspark/sql/connect/plan.py
+++ b/python/pyspark/sql/connect/plan.py
@@ -1049,6 +1049,12 @@ class SQL(LogicalPlan):
 self._query = query
 self._args = args
 
+def _to_expr(self, session: "SparkConnectClient", v: Any) -> 
proto.Expression:
+if isinstance(v, Column):
+return v.to_plan(session)
+else:
+return LiteralExpression._from_value(v).to_plan(session)
+
 def plan(self, session: "SparkConnectClient") -> proto.Relation:
 plan = self._create_proto_relation()
 plan.sql.query = self._query
@@ -1056,14 +1062,10 @@ class SQL(LogicalPlan):
 if self._args is not None and len(self._args) > 0:
 if isinstance(self._args, Dict):
 for k, v in self._args.items():
-plan.sql.args[k].CopyFrom(
-
LiteralExpression._from_value(v).to_plan(session).literal
-)
+
plan.sql.named_arguments[k].CopyFrom(self._to_expr(session, v))
 else:
 for v in self._args:
-plan.sql.pos_args.append(
-
LiteralExpression._from_value(v).to_plan(session).literal
-)
+plan.sql.pos_arguments.append(self._to_expr(session, v))
 
 return plan
 
@@ -1073,14 +1075,10 @@ class SQL(LogicalPlan):
 if self._args is not None and len(self._args) > 0:
 if isinstance(self._args, Dict):
 for k, v in self._args.items():
-cmd.sql_command.args[k].CopyFrom(
-
LiteralExpression._from_value(v).to_plan(session).literal
-)
+
cmd.sql_command.named_arguments[k].CopyFrom(self._to_expr(session, v))
 else:
 for v in self._args:
-cmd.sql_command.pos_args.append(
-
LiteralExpression._from_value(v).to_plan(session).literal
-)
+
cmd.sql_command.pos_arguments.append(self._to_expr(session, v))
 
 return cmd
 
diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index 7582fe86ff2..e5d1d95a699 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -557,7 +557,7 @@ class SparkSession:
 if "sql_command_result" in properties:
 return 
DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self)
 else:
-return DataFrame.withPlan(SQL(sqlQuery, args), self)
+return DataFrame.withPlan(cmd, self)
 
 sql.__doc__ = PySparkSession.sql.__doc__
 
diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py 
b/python/pyspark/sql/tests/connect/test_connect_basic.py
index 2b979570618..c5a127136d6 100644
--- a/python/pyspark/sql/te

[spark] branch master updated: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

2023-09-21 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8b967e191b7 [SPARK-43254][SQL] Assign a name to the error 
_LEGACY_ERROR_TEMP_2018
8b967e191b7 is described below

commit 8b967e191b755d7f2830c15d382c83ce7aeb69c1
Author: dengziming 
AuthorDate: Thu Sep 21 10:22:37 2023 +0300

[SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

### What changes were proposed in this pull request?
Assign the name `CLASS_UNSUPPORTED_BY_MAP_OBJECTS` to the legacy error 
class `_LEGACY_ERROR_TEMP_2018`.

### Why are the changes needed?
To assign proper name as a part of activity in SPARK-37935

### Does this PR introduce _any_ user-facing change?
Yes, the error message will include the error class name

### How was this patch tested?
Add a unit test to produce the error from user code.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42939 from dengziming/SPARK-43254.

Authored-by: dengziming 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 10 +++---
 docs/sql-error-conditions.md   |  6 
 .../sql/catalyst/encoders/ExpressionEncoder.scala  |  2 +-
 .../spark/sql/errors/QueryExecutionErrors.scala|  2 +-
 .../expressions/ObjectExpressionsSuite.scala   | 11 +++---
 .../scala/org/apache/spark/sql/DatasetSuite.scala  | 40 --
 6 files changed, 57 insertions(+), 14 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index d92ccfce5c5..8942d3755e9 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -344,6 +344,11 @@
 ],
 "sqlState" : "22003"
   },
+  "CLASS_UNSUPPORTED_BY_MAP_OBJECTS" : {
+"message" : [
+  "`MapObjects` does not support the class  as resulting collection."
+]
+  },
   "CODEC_NOT_AVAILABLE" : {
 "message" : [
   "The codec  is not available. Consider to set the config 
 to ."
@@ -4944,11 +4949,6 @@
   "not resolved."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2018" : {
-"message" : [
-  "class `` is not supported by `MapObjects` as resulting collection."
-]
-  },
   "_LEGACY_ERROR_TEMP_2020" : {
 "message" : [
   "Couldn't find a valid constructor on ."
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 1df00f72bc9..f6f94efc2b0 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -297,6 +297,12 @@ The value `` of the type `` cannot be 
cast to ``
 
 Fail to assign a value of `` type to the `` type 
column or variable `` due to an overflow. Use `try_cast` on the 
input value to tolerate overflow and return NULL instead.
 
+### CLASS_UNSUPPORTED_BY_MAP_OBJECTS
+
+SQLSTATE: none assigned
+
+`MapObjects` does not support the class `` as resulting collection.
+
 ### CODEC_NOT_AVAILABLE
 
 SQLSTATE: none assigned
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
index ff72b5a0d96..74d7a5e7a67 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
@@ -170,7 +170,7 @@ object ExpressionEncoder {
* Function that deserializes an [[InternalRow]] into an object of type `T`. 
This class is not
* thread-safe.
*/
-  class Deserializer[T](private val expressions: Seq[Expression])
+  class Deserializer[T](val expressions: Seq[Expression])
 extends (InternalRow => T) with Serializable {
 @transient
 private[this] var constructProjection: Projection = _
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index e14fef1fad7..84472490128 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -422,7 +422,7 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
 
   def classUnsupportedByMapObjectsError(cls: Class[_]): SparkRuntimeException 
= {
 new SparkRuntimeException(
-  errorClass = "_LEGACY_ERROR_TEMP_2018",
+  errorClass = "CLASS_UNSUPPORTED_BY_MAP_OBJECTS&qu

[spark] branch master updated: [SPARK-45316][CORE][SQL] Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`

2023-09-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 60d02b444e2 [SPARK-45316][CORE][SQL] Add new parameters 
`ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`
60d02b444e2 is described below

commit 60d02b444e2225b3afbe4955dabbea505e9f769c
Author: Max Gekk 
AuthorDate: Tue Sep 26 17:33:07 2023 +0300

[SPARK-45316][CORE][SQL] Add new parameters 
`ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`

### What changes were proposed in this pull request?
In the PR, I propose to add new parameters 
`ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`, 
and set it to the current value of:
- `spark.files.ignoreCorruptFiles`/`ignoreMissingFiles` in Spark `core`,
- `spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles` when the rdds 
created in Spark SQL.

### Why are the changes needed?
1. To make `HadoopRDD` and `NewHadoopRDD` consistent to other RDDs like 
`FileScanRDD` created by Spark SQL that take into account the SQL configs 
`spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles`.
2. To improve user experience with Spark SQL, so, users can control 
ignoring of missing files without re-creating spark context.

### Does this PR introduce _any_ user-facing change?
Yes, `HadoopRDD`/`NewHadoopRDD` invoked by SQL code such hive table scans 
respect the SQL configs 
`spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles` and don't respect the 
core configs `spark.files.ignoreCorruptFiles`/`ignoreMissingFiles`.

### How was this patch tested?
By running the affected tests:
```
$ build/sbt "test:testOnly *QueryPartitionSuite"
$ build/sbt "test:testOnly *FileSuite"
$ build/sbt "test:testOnly *FileBasedDataSourceSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
    No.

Closes #43097 from MaxGekk/dynamic-ignoreMissingFiles.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/rdd/HadoopRDD.scala | 31 ++
 .../scala/org/apache/spark/rdd/NewHadoopRDD.scala  | 27 +++
 docs/sql-migration-guide.md|  1 +
 .../org/apache/spark/sql/hive/TableReader.scala|  9 ---
 .../spark/sql/hive/QueryPartitionSuite.scala   |  6 ++---
 5 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
index cad107256c5..0b5f6a3d716 100644
--- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
@@ -89,6 +89,8 @@ private[spark] class HadoopPartition(rddId: Int, override val 
index: Int, s: Inp
  * @param keyClass Class of the key associated with the inputFormatClass.
  * @param valueClass Class of the value associated with the inputFormatClass.
  * @param minPartitions Minimum number of HadoopRDD partitions (Hadoop Splits) 
to generate.
+ * @param ignoreCorruptFiles Whether to ignore corrupt files.
+ * @param ignoreMissingFiles Whether to ignore missing files.
  *
  * @note Instantiating this class directly is not recommended, please use
  * `org.apache.spark.SparkContext.hadoopRDD()`
@@ -101,13 +103,36 @@ class HadoopRDD[K, V](
 inputFormatClass: Class[_ <: InputFormat[K, V]],
 keyClass: Class[K],
 valueClass: Class[V],
-minPartitions: Int)
+minPartitions: Int,
+ignoreCorruptFiles: Boolean,
+ignoreMissingFiles: Boolean)
   extends RDD[(K, V)](sc, Nil) with Logging {
 
   if (initLocalJobConfFuncOpt.isDefined) {
 sparkContext.clean(initLocalJobConfFuncOpt.get)
   }
 
+  def this(
+  sc: SparkContext,
+  broadcastedConf: Broadcast[SerializableConfiguration],
+  initLocalJobConfFuncOpt: Option[JobConf => Unit],
+  inputFormatClass: Class[_ <: InputFormat[K, V]],
+  keyClass: Class[K],
+  valueClass: Class[V],
+  minPartitions: Int) = {
+this(
+  sc,
+  broadcastedConf,
+  initLocalJobConfFuncOpt,
+  inputFormatClass,
+  keyClass,
+  valueClass,
+  minPartitions,
+  ignoreCorruptFiles = sc.conf.get(IGNORE_CORRUPT_FILES),
+  ignoreMissingFiles = sc.conf.get(IGNORE_MISSING_FILES)
+)
+  }
+
   def this(
   sc: SparkContext,
   conf: JobConf,
@@ -135,10 +160,6 @@ class HadoopRDD[K, V](
 
   private val shouldCloneJobConf = 
sparkContext.conf.getBoolean("spark.hadoop.cloneConf", false)
 
-  private val ignoreCorruptFiles = sparkContext.conf.get(IGNORE_CORRUPT_FILES)
-
-  private val ignoreMissingFiles = sparkContext.conf.get(IGNORE_MISSING_FILES)
-
   private val ignoreEmptySplits =

[spark] branch master updated: [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath`

2023-09-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eff46ea77e9 [SPARK-45340][SQL] Remove the SQL config 
`spark.sql.hive.verifyPartitionPath`
eff46ea77e9 is described below

commit eff46ea77e9bebef3076277bef1e086833dd
Author: Max Gekk 
AuthorDate: Wed Sep 27 08:28:45 2023 +0300

[SPARK-45340][SQL] Remove the SQL config 
`spark.sql.hive.verifyPartitionPath`

### What changes were proposed in this pull request?
In the PR, I propose to remove already deprecated SQL config 
`spark.sql.hive.verifyPartitionPath`, and the code under the config. The config 
has been deprecated since Spark 3.0.

### Why are the changes needed?
To improve code maintainability by remove unused code.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test suite:
```
$ build/sbt "test:testOnly *SQLConfSuite"
$ build/sbt "test:testOnly *QueryPartitionSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

    Closes #43130 from MaxGekk/remove-verifyPartitionPath.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/internal/SQLConf.scala| 17 ++---
 .../apache/spark/sql/internal/SQLConfSuite.scala   |  4 +--
 .../org/apache/spark/sql/hive/TableReader.scala| 41 +-
 .../spark/sql/hive/QueryPartitionSuite.scala   | 12 ++-
 4 files changed, 8 insertions(+), 66 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 43eb0756d8d..aeef531dbcd 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -34,7 +34,6 @@ import org.apache.hadoop.fs.Path
 import org.apache.spark.{ErrorMessageFormat, SparkConf, SparkContext, 
TaskContext}
 import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config._
-import org.apache.spark.internal.config.{IGNORE_MISSING_FILES => 
SPARK_IGNORE_MISSING_FILES}
 import org.apache.spark.network.util.ByteUnit
 import org.apache.spark.sql.catalyst.ScalaReflection
 import org.apache.spark.sql.catalyst.analysis.{HintErrorLogger, Resolver}
@@ -1261,14 +1260,6 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
-  val HIVE_VERIFY_PARTITION_PATH = 
buildConf("spark.sql.hive.verifyPartitionPath")
-.doc("When true, check all the partition paths under the table\'s root 
directory " +
- "when reading data stored in HDFS. This configuration will be 
deprecated in the future " +
- s"releases and replaced by ${SPARK_IGNORE_MISSING_FILES.key}.")
-.version("1.4.0")
-.booleanConf
-.createWithDefault(false)
-
   val HIVE_METASTORE_DROP_PARTITION_BY_NAME =
 buildConf("spark.sql.hive.dropPartitionByName.enabled")
   .doc("When true, Spark will get partition name rather than partition 
object " +
@@ -4472,8 +4463,6 @@ object SQLConf {
 PANDAS_GROUPED_MAP_ASSIGN_COLUMNS_BY_NAME.key, "2.4",
 "The config allows to switch to the behaviour before Spark 2.4 " +
   "and will be removed in the future releases."),
-  DeprecatedConfig(HIVE_VERIFY_PARTITION_PATH.key, "3.0",
-s"This config is replaced by '${SPARK_IGNORE_MISSING_FILES.key}'."),
   DeprecatedConfig(ARROW_EXECUTION_ENABLED.key, "3.0",
 s"Use '${ARROW_PYSPARK_EXECUTION_ENABLED.key}' instead of it."),
   DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0",
@@ -4552,7 +4541,9 @@ object SQLConf {
   RemovedConfig("spark.sql.ansi.strictIndexOperator", "3.4.0", "true",
 "This was an internal configuration. It is not needed anymore since 
Spark SQL always " +
   "returns null when getting a map value with a non-existing key. See 
SPARK-40066 " +
-  "for more details.")
+  "for more details."),
+  RemovedConfig("spark.sql.hive.verifyPartitionPath", "4.0.0", "false",
+s"This config was replaced by '${IGNORE_MISSING_FILES.key}'.")
 )
 
 Map(configs.map { cfg => cfg.key -> cfg } : _*)
@@ -4766,8 +4757,6 @@ class SQLConf extends Serializable with Logging with 
SqlApiConf {
 
   def isOrcSchemaMergingEnabled: Boolean = getConf(ORC_SCHEMA_MERGING_ENABLED)
 
-  def verifyPartitionPath: Boolean = getConf(HIVE_VERIFY_PAR

[spark] branch master updated: [MINOR][SQL] Remove duplicate cases of escaping characters in string literals

2023-09-28 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ee21b12c395 [MINOR][SQL] Remove duplicate cases of escaping characters 
in string literals
ee21b12c395 is described below

commit ee21b12c395ac184c8ddc2f74b66f6e6285de5fa
Author: Max Gekk 
AuthorDate: Thu Sep 28 21:18:40 2023 +0300

[MINOR][SQL] Remove duplicate cases of escaping characters in string 
literals

### What changes were proposed in this pull request?
In the PR, I propose to remove some cases in `appendEscapedChar()` because 
they fall to the default case.

The following tests check the cases:
- 
https://github.com/apache/spark/blob/187e9a851758c0e9cec11edab2bc07d6f4404001/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala#L97-L98
- 
https://github.com/apache/spark/blob/187e9a851758c0e9cec11edab2bc07d6f4404001/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala#L104

### Why are the changes needed?
To improve code maintainability.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the affected test suite:
```
$ build/sbt "test:testOnly *.ParserUtilsSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43170 from MaxGekk/cleanup-escaping.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala| 3 ---
 1 file changed, 3 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala
index c318f208255..a4ce5fb1203 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala
@@ -38,14 +38,11 @@ trait SparkParserUtils {
 def appendEscapedChar(n: Char): Unit = {
   n match {
 case '0' => sb.append('\u')
-case '\'' => sb.append('\'')
-case '"' => sb.append('\"')
 case 'b' => sb.append('\b')
 case 'n' => sb.append('\n')
 case 'r' => sb.append('\r')
 case 't' => sb.append('\t')
 case 'Z' => sb.append('\u001A')
-case '\\' => sb.append('\\')
 // The following 2 lines are exactly what MySQL does TODO: why do we 
do this?
 case '%' => sb.append("\\%")
 case '_' => sb.append("\\_")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45398][SQL] Append `ESCAPE` in `sql()` of the `Like` expression

2023-10-04 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cc4ecb5104e [SPARK-45398][SQL] Append `ESCAPE` in `sql()` of the 
`Like` expression
cc4ecb5104e is described below

commit cc4ecb5104e37d5e530d44b41fc1d8f8116e37d8
Author: Max Gekk 
AuthorDate: Wed Oct 4 11:35:05 2023 +0300

[SPARK-45398][SQL] Append `ESCAPE` in `sql()` of the `Like` expression

### What changes were proposed in this pull request?
In the PR, I propose to fix the `sql()` method of the `Like` expression, 
and append the `ESCAPE` clause when the `escapeChar` is not the default one 
`\\`.

### Why are the changes needed?
1. To be consistent to the `toString()` method
2. To distinguish column names when the escape argument is set. Before the 
changes, columns might conflict like the example below, and that could confuse 
users:
```sql
spark-sql (default)> create temp view tbl as (SELECT 'a|_' like 'a||_' 
escape '|', 'a|_' like 'a||_' escape 'a');
[COLUMN_ALREADY_EXISTS] The column `a|_ like a||_` already exists. Consider 
to choose another name or rename the existing column.
```

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
Manually checking the column name by:
```sql
spark-sql (default)> create temp view tbl as (SELECT 'a|_' like 'a||_' 
escape '|', 'a|_' like 'a||_' escape 'a');
Time taken: 0.531 seconds
spark-sql (default)> describe extended tbl;
a|_ LIKE a||_ ESCAPE '|'boolean
a|_ LIKE a||_ ESCAPE 'a'boolean
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43196 from MaxGekk/fix-like-sql.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../explain-results/function_like_with_escape.explain |  2 +-
 .../spark/sql/catalyst/expressions/regexpExpressions.scala| 11 +++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_like_with_escape.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_like_with_escape.explain
index 471a3a4bd52..1a15a27d97e 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_like_with_escape.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_like_with_escape.explain
@@ -1,2 +1,2 @@
-Project [g#0 LIKE g#0 ESCAPE '/' AS g LIKE g#0]
+Project [g#0 LIKE g#0 ESCAPE '/' AS g LIKE g ESCAPE '/'#0]
 +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
index 5ebfdd919b8..69d90296d7f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
@@ -133,12 +133,15 @@ case class Like(left: Expression, right: Expression, 
escapeChar: Char)
 
   final override val nodePatterns: Seq[TreePattern] = Seq(LIKE_FAMLIY)
 
-  override def toString: String = escapeChar match {
-case '\\' => s"$left LIKE $right"
-case c => s"$left LIKE $right ESCAPE '$c'"
+  override def toString: String = {
+val escapeSuffix = if (escapeChar == '\\') "" else s" ESCAPE '$escapeChar'"
+s"$left ${prettyName.toUpperCase(Locale.ROOT)} $right" + escapeSuffix
   }
 
-  override def sql: String = s"${left.sql} 
${prettyName.toUpperCase(Locale.ROOT)} ${right.sql}"
+  override def sql: String = {
+val escapeSuffix = if (escapeChar == '\\') "" else s" ESCAPE 
${Literal(escapeChar).sql}"
+s"${left.sql} ${prettyName.toUpperCase(Locale.ROOT)} ${right.sql}" + 
escapeSuffix
+  }
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val patternClass = classOf[Pattern].getName


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions

2023-10-05 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c0d9ca3be14c [SPARK-45400][SQL][DOCS] Refer to the unescaping rules 
from expression descriptions
c0d9ca3be14c is described below

commit c0d9ca3be14cb0ec8d8f9920d3ecc4aac3cf5adc
Author: Max Gekk 
AuthorDate: Thu Oct 5 22:22:29 2023 +0300

[SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression 
descriptions

### What changes were proposed in this pull request?
In the PR, I propose to refer to the unescaping rules added by 
https://github.com/apache/spark/pull/43152 from expression descriptions like in 
`Like`, see
https://github.com/apache/spark/assets/1580697/6a332b50-f2c8-4549-848a-61519c9f964e";>

### Why are the changes needed?
To improve user experience w/ Spark SQL.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually generated docs and checked by eyes.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43203 from MaxGekk/link-to-escape-doc.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 docs/sql-ref-literals.md   |  2 +
 .../catalyst/expressions/regexpExpressions.scala   | 70 ++
 2 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/docs/sql-ref-literals.md b/docs/sql-ref-literals.md
index e9447af71c54..2a02a22bd6f0 100644
--- a/docs/sql-ref-literals.md
+++ b/docs/sql-ref-literals.md
@@ -62,6 +62,8 @@ The following escape sequences are recognized in regular 
string literals (withou
 - `\_` -> `\_`;
 - `\` -> ``, skip the slash and leave the character as 
is.
 
+The unescaping rules above can be turned off by setting the SQL config 
`spark.sql.parser.escapedStringLiterals` to `true`.
+
  Examples
 
 ```sql
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
index 69d90296d7ff..87ea8b5a102a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
@@ -77,7 +77,7 @@ abstract class StringRegexExpression extends BinaryExpression
   }
 }
 
-// scalastyle:off line.contains.tab
+// scalastyle:off line.contains.tab line.size.limit
 /**
  * Simple RegEx pattern matching function
  */
@@ -92,11 +92,14 @@ abstract class StringRegexExpression extends 
BinaryExpression
   _ matches any one character in the input (similar to . in posix 
regular expressions)\
   % matches zero or more characters in the input (similar to .* in 
posix regular
   expressions)
-  Since Spark 2.0, string literals are unescaped in our SQL parser. 
For example, in order
-  to match "\abc", the pattern should be "\\abc".
+  Since Spark 2.0, string literals are unescaped in our SQL parser, 
see the unescaping
+  rules at https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal";>String
 Literal.
+  For example, in order to match "\abc", the pattern should be 
"\\abc".
   When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, 
it falls back
   to Spark 1.6 behavior regarding string literal parsing. For example, 
if the config is
-  enabled, the pattern to match "\abc" should be "\abc".
+  enabled, the pattern to match "\abc" should be "\abc".
+  It's recommended to use a raw string literal (with the `r` prefix) 
to avoid escaping
+  special characters in the pattern string if exists.
   * escape - an character added since Spark 3.0. The default escape 
character is the '\'.
   If an escape character precedes a special symbol or another escape 
character, the
   following character is matched literally. It is invalid to escape 
any other character.
@@ -121,7 +124,7 @@ abstract class StringRegexExpression extends 
BinaryExpression
   """,
   since = "1.0.0",
   group = "predicate_funcs")
-// scalastyle:on line.contains.tab
+// scalastyle:on line.contains.tab line.size.limit
 case class Like(left: Expression, right: Expression, escapeChar: Char)
   extends StringRegexExpression {
 
@@ -207,11 +210,14 @@ case class Like(left: Expression, right: Expression, 
escapeChar: Char)
   _ matches any one character in the input (similar to . in posix 
regular expressions)
   % matches zero or more characters in the

[spark] branch master updated: [SPARK-45262][SQL][TESTS][DOCS] Improve examples for regexp parameters

2023-10-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e3b1bb117fe9 [SPARK-45262][SQL][TESTS][DOCS] Improve examples for 
regexp parameters
e3b1bb117fe9 is described below

commit e3b1bb117fe9bf0b17321e6359b7aa90f70a24b5
Author: Max Gekk 
AuthorDate: Fri Oct 6 22:34:40 2023 +0300

[SPARK-45262][SQL][TESTS][DOCS] Improve examples for regexp parameters

### What changes were proposed in this pull request?
In the PR, I propose to add a few more examples for `LIKE`, `ILIKE`, 
`RLIKE`, `regexp_instr()`, `regexp_extract_all()` that highlight correctness of 
current description and test a couple more of corner cases.

### Why are the changes needed?
The description of `LIKE` says:
```
... in order to match "\abc", the pattern should be "\\abc"
```
but in Spark SQL shell:
```sql
spark-sql (default)> SELECT c FROM t;
\abc
spark-sql (default)> SELECT c LIKE "\\abc" FROM t;
[INVALID_FORMAT.ESC_IN_THE_MIDDLE] The format is invalid: '\\abc'. The 
escape character is not allowed to precede 'a'.
spark-sql (default)> SELECT c LIKE "abc" FROM t;
true
```
So, the description might confuse users since the pattern must contain 4 
slashes when the pattern is a regular SQL string.

New example shows that the pattern "\\abc" is correct if we take into 
account the string as a raw string:
```sql
spark-sql (default)> SELECT c LIKE R"\\abc" FROM t;
true
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new and modified tests:
```
$ build/sbt "test:testOnly *.StringFunctionsSuite"
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.expressions.ExpressionInfoSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43037 from MaxGekk/fix-like-doc.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/expressions/regexpExpressions.scala   | 18 --
 .../resources/sql-functions/sql-expression-schema.md   |  2 +-
 .../org/apache/spark/sql/StringFunctionsSuite.scala|  5 +
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
index 87ea8b5a102a..b33de303b5d5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
@@ -108,13 +108,15 @@ abstract class StringRegexExpression extends 
BinaryExpression
 Examples:
   > SELECT _FUNC_('Spark', '_park');
   true
+  > SELECT '\\abc' AS S, S _FUNC_ r'\\abc', S _FUNC_ 'abc';
+  \abc truetrue
   > SET spark.sql.parser.escapedStringLiterals=true;
   spark.sql.parser.escapedStringLiterals   true
   > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%';
   true
   > SET spark.sql.parser.escapedStringLiterals=false;
   spark.sql.parser.escapedStringLiterals   false
-  > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%';
+  > SELECT '%SystemDrive%\\Users\\John' _FUNC_ r'%SystemDrive%\\Users%';
   true
   > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' 
ESCAPE '/';
   true
@@ -226,13 +228,15 @@ case class Like(left: Expression, right: Expression, 
escapeChar: Char)
 Examples:
   > SELECT _FUNC_('Spark', '_Park');
   true
+  > SELECT '\\abc' AS S, S _FUNC_ r'\\abc', S _FUNC_ 'abc';
+  \abc truetrue
   > SET spark.sql.parser.escapedStringLiterals=true;
   spark.sql.parser.escapedStringLiterals   true
   > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\users%';
   true
   > SET spark.sql.parser.escapedStringLiterals=false;
   spark.sql.parser.escapedStringLiterals   false
-  > SELECT '%SystemDrive%\\USERS\\John' _FUNC_ '\%SystemDrive\%Users%';
+  > SELECT '%SystemDrive%\\USERS\\John' _FUNC_ r'%SystemDrive%\\Users%';
   true
   > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SYSTEMDrive/%//Users%' 
ESCAPE '/'

[spark] branch master updated: [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match

2023-10-09 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4493b431192 [SPARK-45424][SQL] Fix TimestampFormatter return optional 
parse results when only prefix match
4493b431192 is described below

commit 4493b431192fcdbab1379b7ffb89eea0cdaa19f1
Author: Jia Fan 
AuthorDate: Mon Oct 9 12:30:20 2023 +0300

[SPARK-45424][SQL] Fix TimestampFormatter return optional parse results 
when only prefix match

### What changes were proposed in this pull request?
When use custom pattern to parse timestamp, if there have matched prefix, 
not matched all. The `Iso8601TimestampFormatter::parseOptional` and 
`Iso8601TimestampFormatter::parseWithoutTimeZoneOptional` should not return not 
empty result.
eg: pattern = `-MM-dd HH:mm:ss`, value = `-12-31 23:59:59.999`. If 
fact, `-MM-dd HH:mm:ss` can parse `-12-31 23:59:59`  normally, but 
value have suffix `.999`. so we can't return not empty result.
This bug will affect inference the schema in CSV/JSON.

### Why are the changes needed?
Fix inference the schema bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43245 from Hisoka-X/SPARK-45424-inference-schema-unresolved.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/util/TimestampFormatter.scala| 10 ++
 .../spark/sql/catalyst/util/TimestampFormatterSuite.scala  | 10 ++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
index 8a288d0e9f3..55eee41c14c 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@@ -167,8 +167,9 @@ class Iso8601TimestampFormatter(
 
   override def parseOptional(s: String): Option[Long] = {
 try {
-  val parsed = formatter.parseUnresolved(s, new ParsePosition(0))
-  if (parsed != null) {
+  val parsePosition = new ParsePosition(0)
+  val parsed = formatter.parseUnresolved(s, parsePosition)
+  if (parsed != null && s.length == parsePosition.getIndex) {
 Some(extractMicros(parsed))
   } else {
 None
@@ -196,8 +197,9 @@ class Iso8601TimestampFormatter(
 
   override def parseWithoutTimeZoneOptional(s: String, allowTimeZone: 
Boolean): Option[Long] = {
 try {
-  val parsed = formatter.parseUnresolved(s, new ParsePosition(0))
-  if (parsed != null) {
+  val parsePosition = new ParsePosition(0)
+  val parsed = formatter.parseUnresolved(s, parsePosition)
+  if (parsed != null && s.length == parsePosition.getIndex) {
 Some(extractMicrosNTZ(s, parsed, allowTimeZone))
   } else {
 None
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
index ecd849dd3af..d2fc89a034f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
@@ -491,4 +491,14 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
 assert(simpleFormatter.parseOptional("abc").isEmpty)
 
   }
+
+  test("SPARK-45424: do not return optional parse results when only prefix 
match") {
+val formatter = new Iso8601TimestampFormatter(
+  "-MM-dd HH:mm:ss",
+  locale = DateFormatter.defaultLocale,
+  legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT,
+  isParsing = true, zoneId = DateTimeTestUtils.LA)
+assert(formatter.parseOptional("-12-31 23:59:59.999").isEmpty)
+assert(formatter.parseWithoutTimeZoneOptional("-12-31 23:59:59.999", 
true).isEmpty)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match

2023-10-09 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 5f8ae9a3dbd [SPARK-45424][SQL] Fix TimestampFormatter return optional 
parse results when only prefix match
5f8ae9a3dbd is described below

commit 5f8ae9a3dbd2c7624bffd588483c9916c302c081
Author: Jia Fan 
AuthorDate: Mon Oct 9 12:30:20 2023 +0300

[SPARK-45424][SQL] Fix TimestampFormatter return optional parse results 
when only prefix match

### What changes were proposed in this pull request?
When use custom pattern to parse timestamp, if there have matched prefix, 
not matched all. The `Iso8601TimestampFormatter::parseOptional` and 
`Iso8601TimestampFormatter::parseWithoutTimeZoneOptional` should not return not 
empty result.
eg: pattern = `-MM-dd HH:mm:ss`, value = `-12-31 23:59:59.999`. If 
fact, `-MM-dd HH:mm:ss` can parse `-12-31 23:59:59`  normally, but 
value have suffix `.999`. so we can't return not empty result.
This bug will affect inference the schema in CSV/JSON.

### Why are the changes needed?
Fix inference the schema bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43245 from Hisoka-X/SPARK-45424-inference-schema-unresolved.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
(cherry picked from commit 4493b431192fcdbab1379b7ffb89eea0cdaa19f1)
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/util/TimestampFormatter.scala| 10 ++
 .../spark/sql/catalyst/util/TimestampFormatterSuite.scala  | 10 ++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
index 8a288d0e9f3..55eee41c14c 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@@ -167,8 +167,9 @@ class Iso8601TimestampFormatter(
 
   override def parseOptional(s: String): Option[Long] = {
 try {
-  val parsed = formatter.parseUnresolved(s, new ParsePosition(0))
-  if (parsed != null) {
+  val parsePosition = new ParsePosition(0)
+  val parsed = formatter.parseUnresolved(s, parsePosition)
+  if (parsed != null && s.length == parsePosition.getIndex) {
 Some(extractMicros(parsed))
   } else {
 None
@@ -196,8 +197,9 @@ class Iso8601TimestampFormatter(
 
   override def parseWithoutTimeZoneOptional(s: String, allowTimeZone: 
Boolean): Option[Long] = {
 try {
-  val parsed = formatter.parseUnresolved(s, new ParsePosition(0))
-  if (parsed != null) {
+  val parsePosition = new ParsePosition(0)
+  val parsed = formatter.parseUnresolved(s, parsePosition)
+  if (parsed != null && s.length == parsePosition.getIndex) {
 Some(extractMicrosNTZ(s, parsed, allowTimeZone))
   } else {
 None
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
index eb173bc7f8c..2134a0d6ecd 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
@@ -507,4 +507,14 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
 assert(simpleFormatter.parseOptional("abc").isEmpty)
 
   }
+
+  test("SPARK-45424: do not return optional parse results when only prefix 
match") {
+val formatter = new Iso8601TimestampFormatter(
+  "-MM-dd HH:mm:ss",
+  locale = DateFormatter.defaultLocale,
+  legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT,
+  isParsing = true, zoneId = DateTimeTestUtils.LA)
+assert(formatter.parseOptional("-12-31 23:59:59.999").isEmpty)
+assert(formatter.parseWithoutTimeZoneOptional("-12-31 23:59:59.999", 
true).isEmpty)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4493b431192 -> af800b50595)

2023-10-09 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4493b431192 [SPARK-45424][SQL] Fix TimestampFormatter return optional 
parse results when only prefix match
 add af800b50595 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra 
spaces in the automatically generated `sql-error-conditions.md` file

No new revisions were added by this update.

Summary of changes:
 core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file

2023-10-09 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 4841a404be3 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra 
spaces in the automatically generated `sql-error-conditions.md` file
4841a404be3 is described below

commit 4841a404be3c37fc16031a0119b321eefcb2faab
Author: panbingkun 
AuthorDate: Mon Oct 9 12:32:14 2023 +0300

[SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the 
automatically generated `sql-error-conditions.md` file

### What changes were proposed in this pull request?
The pr aims to remove the last 2 extra spaces in the automatically 
generated `sql-error-conditions.md` file.

### Why are the changes needed?
- When I am work on another PR, I use the following command:
```
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt \
"core/testOnly *SparkThrowableSuite -- -t \"Error classes match 
with document\""
```
  I found that in the automatically generated `sql-error-conditions.md` 
file, there are 2 extra spaces added at the end,
Obviously, this is not what we expected, otherwise we would need to 
manually remove it, which is not in line with automation.

- The git tells us this difference, as follows:
https://github.com/apache/spark/assets/15246973/a68b657f-3a00-4405-9623-1f7ab9d44d82";>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43274 from panbingkun/SPARK-45459.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
(cherry picked from commit af800b505956ff26e03c5fc56b6cb4ac5c0efe2f)
Signed-off-by: Max Gekk 
---
 core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index 0249cde5488..299bcea3f9e 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -253,8 +253,7 @@ class SparkThrowableSuite extends SparkFunSuite {
  |
  |Also see [SQLSTATE Codes](sql-error-conditions-sqlstates.html).
  |
- |$sqlErrorParentDocContent
- |""".stripMargin
+ |$sqlErrorParentDocContent""".stripMargin
 
 errors.filter(_._2.subClass.isDefined).foreach(error => {
   val name = error._1
@@ -316,7 +315,7 @@ class SparkThrowableSuite extends SparkFunSuite {
 }
 FileUtils.writeStringToFile(
   parentDocPath.toFile,
-  sqlErrorParentDoc + lineSeparator,
+  sqlErrorParentDoc,
   StandardCharsets.UTF_8)
   }
 } else {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45383][SQL] Fix error message for time travel with non-existing table

2023-10-09 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ced321c8b5a [SPARK-45383][SQL] Fix error message for time travel with 
non-existing table
ced321c8b5a is described below

commit ced321c8b5a32c69dfb2841d4bec8a03f21b8038
Author: Wenchen Fan 
AuthorDate: Mon Oct 9 22:15:45 2023 +0300

[SPARK-45383][SQL] Fix error message for time travel with non-existing table

### What changes were proposed in this pull request?

Fixes a small bug to report `TABLE_OR_VIEW_NOT_FOUND` error correctly for 
time travel. It was missed before because `RelationTimeTravel` is a leaf node 
but it may contain `UnresolvedRelation`.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

Yes, the error message becomes reasonable

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43298 from cloud-fan/time-travel.

Authored-by: Wenchen Fan 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala|  4 
 .../org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 11 +++
 2 files changed, 15 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index e140625f47a..611dd7b3009 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -384,6 +384,9 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 })
 
 operator match {
+  case RelationTimeTravel(u: UnresolvedRelation, _, _) =>
+u.tableNotFound(u.multipartIdentifier)
+
   case etw: EventTimeWatermark =>
 etw.eventTime.dataType match {
   case s: StructType
@@ -396,6 +399,7 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 "eventName" -> toSQLId(etw.eventTime.name),
 "eventType" -> toSQLType(etw.eventTime.dataType)))
 }
+
   case f: Filter if f.condition.dataType != BooleanType =>
 f.failAnalysis(
   errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
index ae639b272a2..047bc8de739 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
@@ -3014,6 +3014,17 @@ class DataSourceV2SQLSuiteV1Filter
 sqlState = None,
 parameters = Map("relationId" -> "`x`"))
 
+  checkError(
+exception = intercept[AnalysisException] {
+  sql("SELECT * FROM non_exist VERSION AS OF 1")
+},
+errorClass = "TABLE_OR_VIEW_NOT_FOUND",
+parameters = Map("relationName" -> "`non_exist`"),
+context = ExpectedContext(
+  fragment = "non_exist",
+  start = 14,
+  stop = 22))
+
   val subquery1 = "SELECT 1 FROM non_exist"
   checkError(
 exception = intercept[AnalysisException] {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45383][SQL] Fix error message for time travel with non-existing table

2023-10-09 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8bf5a5bca3f [SPARK-45383][SQL] Fix error message for time travel with 
non-existing table
8bf5a5bca3f is described below

commit 8bf5a5bca3f9f7db78182d14e56476d384f442fa
Author: Wenchen Fan 
AuthorDate: Mon Oct 9 22:15:45 2023 +0300

[SPARK-45383][SQL] Fix error message for time travel with non-existing table

### What changes were proposed in this pull request?

Fixes a small bug to report `TABLE_OR_VIEW_NOT_FOUND` error correctly for 
time travel. It was missed before because `RelationTimeTravel` is a leaf node 
but it may contain `UnresolvedRelation`.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

Yes, the error message becomes reasonable

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43298 from cloud-fan/time-travel.

Authored-by: Wenchen Fan 
Signed-off-by: Max Gekk 
(cherry picked from commit ced321c8b5a32c69dfb2841d4bec8a03f21b8038)
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala|  4 
 .../org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 11 +++
 2 files changed, 15 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 511f3622e7e..533ea8a2b79 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -365,6 +365,9 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 })
 
 operator match {
+  case RelationTimeTravel(u: UnresolvedRelation, _, _) =>
+u.tableNotFound(u.multipartIdentifier)
+
   case etw: EventTimeWatermark =>
 etw.eventTime.dataType match {
   case s: StructType
@@ -377,6 +380,7 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 "eventName" -> toSQLId(etw.eventTime.name),
 "eventType" -> toSQLType(etw.eventTime.dataType)))
 }
+
   case f: Filter if f.condition.dataType != BooleanType =>
 f.failAnalysis(
   errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
index 06f5600e0d1..7745e9c0a4e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
@@ -3014,6 +3014,17 @@ class DataSourceV2SQLSuiteV1Filter
 sqlState = None,
 parameters = Map("relationId" -> "`x`"))
 
+  checkError(
+exception = intercept[AnalysisException] {
+  sql("SELECT * FROM non_exist VERSION AS OF 1")
+},
+errorClass = "TABLE_OR_VIEW_NOT_FOUND",
+parameters = Map("relationName" -> "`non_exist`"),
+context = ExpectedContext(
+  fragment = "non_exist",
+  start = 14,
+  stop = 22))
+
   val subquery1 = "SELECT 1 FROM non_exist"
   checkError(
 exception = intercept[AnalysisException] {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f378b506bf1 -> 76230765674)

2023-10-09 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f378b506bf1 [SPARK-45470][SQL] Avoid paste string value of hive orc 
compression kind
 add 76230765674 [SPARK-45458][SQL] Convert IllegalArgumentException to 
SparkIllegalArgumentException in bitwiseExpressions

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json|  5 +++
 ...nditions-invalid-parameter-value-error-class.md |  4 +++
 .../catalyst/expressions/bitwiseExpressions.scala  | 15 
 .../spark/sql/errors/QueryExecutionErrors.scala| 10 ++
 .../expressions/BitwiseExpressionsSuite.scala  | 42 --
 .../resources/sql-tests/results/bitwise.sql.out| 26 +++---
 6 files changed, 79 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45213][SQL] Assign name to the error _LEGACY_ERROR_TEMP_2151

2023-10-10 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6373f19f537 [SPARK-45213][SQL] Assign name to the error 
_LEGACY_ERROR_TEMP_2151
6373f19f537 is described below

commit 6373f19f537f69c6460b2e4097f19903c01a608f
Author: dengziming 
AuthorDate: Tue Oct 10 15:36:18 2023 +0300

[SPARK-45213][SQL] Assign name to the error _LEGACY_ERROR_TEMP_2151

### What changes were proposed in this pull request?
Assign the name `EXPRESSION_DECODING_FAILED` to the legacy error class 
`_LEGACY_ERROR_TEMP_2151`.

### Why are the changes needed?
To assign proper name as a part of activity in SPARK-37935.

### Does this PR introduce _any_ user-facing change?
Yes, the error message will include the error class name

### How was this patch tested?
An existing unit test to produce the error from user code.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43029 from dengziming/SPARK-45213.

Authored-by: dengziming 
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/error-classes.json  | 11 +--
 docs/sql-error-conditions.md  |  6 ++
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala|  3 +--
 .../spark/sql/catalyst/encoders/EncoderResolutionSuite.scala  |  2 +-
 .../src/test/scala/org/apache/spark/sql/DatasetSuite.scala|  5 ++---
 5 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 690d1ae1a14..1239793b3f9 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -921,6 +921,11 @@
   }
 }
   },
+  "EXPRESSION_DECODING_FAILED" : {
+"message" : [
+  "Failed to decode a row to a value of the expressions: ."
+]
+  },
   "EXPRESSION_TYPE_IS_NOT_ORDERABLE" : {
 "message" : [
   "Column expression  cannot be sorted because its type  
is not orderable."
@@ -5524,12 +5529,6 @@
   "Due to Scala's limited support of tuple, tuple with more than 22 
elements are not supported."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2151" : {
-"message" : [
-  "Error while decoding: ",
-  "."
-]
-  },
   "_LEGACY_ERROR_TEMP_2152" : {
 "message" : [
   "Error while encoding: ",
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index fda10eceb97..b4ee7358b52 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -551,6 +551,12 @@ The table `` does not support ``.
 
 For more details see 
[EXPECT_VIEW_NOT_TABLE](sql-error-conditions-expect-view-not-table-error-class.html)
 
+### EXPRESSION_DECODING_FAILED
+
+SQLSTATE: none assigned
+
+Failed to decode a row to a value of the expressions: ``.
+
 ### EXPRESSION_TYPE_IS_NOT_ORDERABLE
 
 SQLSTATE: none assigned
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index bd4d7a3be7f..5396ae5ff70 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -1342,9 +1342,8 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
 
   def expressionDecodingError(e: Exception, expressions: Seq[Expression]): 
SparkRuntimeException = {
 new SparkRuntimeException(
-  errorClass = "_LEGACY_ERROR_TEMP_2151",
+  errorClass = "EXPRESSION_DECODING_FAILED",
   messageParameters = Map(
-"e" -> e.toString(),
 "expressions" -> expressions.map(
   _.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")),
   cause = e)
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala
index f4106e65e7c..7f54987ee7e 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala
@@ -172,7 +172,7 @@ class EncoderResolutionSuite extends PlanTest {
 val e = intercept[RuntimeException] {
   fromRow(InternalRow(new GenericArrayData(Array(1, null
 }
-assert(e.getMessage.contains("Null value appe

[spark] branch master updated (e1a7b84f47b -> ae112e4279f)

2023-10-11 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e1a7b84f47b [SPARK-45397][ML][CONNECT] Add array assembler feature 
transformer
 add ae112e4279f [SPARK-45116][SQL] Add some comment for param of 
JdbcDialect `createTable`

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala| 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-10-11 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c2525308330 [SPARK-42881][SQL] Codegen Support for get_json_object
c2525308330 is described below

commit c252530833097759b1f943ff89b05f22025f0dd0
Author: panbingkun 
AuthorDate: Wed Oct 11 17:42:48 2023 +0300

[SPARK-42881][SQL] Codegen Support for get_json_object

### What changes were proposed in this pull request?
The PR adds Codegen Support for get_json_object.

### Why are the changes needed?
Improve codegen coverage and performance.
Github benchmark 
data(https://github.com/panbingkun/spark/actions/runs/4497396473/jobs/7912952710):
https://user-images.githubusercontent.com/15246973/227117793-bab38c42-dcc1-46de-a689-25a87b8f3561.png";>

Local benchmark data:
https://user-images.githubusercontent.com/15246973/227098745-9b360e60-fe84-4419-8b7d-073a0530816a.png";>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Add new UT.
Pass GA.

Closes #40506 from panbingkun/json_code_gen.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/expressions/jsonExpressions.scala | 121 +---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 127 +++--
 .../org/apache/spark/sql/JsonFunctionsSuite.scala  |  28 +
 .../execution/datasources/json/JsonBenchmark.scala |  15 ++-
 4 files changed, 208 insertions(+), 83 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index e7df542ddab..04bc457b66a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -28,7 +28,8 @@ import com.fasterxml.jackson.core.json.JsonReadFeature
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.DataTypeMismatch
-import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, CodegenFallback, ExprCode}
+import org.apache.spark.sql.catalyst.expressions.codegen.Block.BlockHelper
 import org.apache.spark.sql.catalyst.json._
 import org.apache.spark.sql.catalyst.trees.TreePattern.{JSON_TO_STRUCT, 
TreePattern}
 import org.apache.spark.sql.catalyst.util._
@@ -125,13 +126,7 @@ private[this] object SharedFactory {
   group = "json_funcs",
   since = "1.5.0")
 case class GetJsonObject(json: Expression, path: Expression)
-  extends BinaryExpression with ExpectsInputTypes with CodegenFallback {
-
-  import com.fasterxml.jackson.core.JsonToken._
-
-  import PathInstruction._
-  import SharedFactory._
-  import WriteStyle._
+  extends BinaryExpression with ExpectsInputTypes {
 
   override def left: Expression = json
   override def right: Expression = path
@@ -140,18 +135,114 @@ case class GetJsonObject(json: Expression, path: 
Expression)
   override def nullable: Boolean = true
   override def prettyName: String = "get_json_object"
 
-  @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])
+  @transient
+  private lazy val evaluator = if (path.foldable) {
+new GetJsonObjectEvaluator(path.eval().asInstanceOf[UTF8String])
+  } else {
+new GetJsonObjectEvaluator()
+  }
 
   override def eval(input: InternalRow): Any = {
-val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+if (!path.foldable) {
+  evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+}
+evaluator.evaluate()
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evaluatorClass = classOf[GetJsonObjectEvaluator].getName
+val initEvaluator = path.foldable match {
+  case true if path.eval() != null =>
+val cachedPath = path.eval().asInstanceOf[UTF8String]
+val refCachedPath = ctx.addReferenceObj("cachedPath", cachedPath)
+s"new $evaluatorClass($refCachedPath)"
+  case _ => s"new $evaluatorClass()"
+}
+val evaluator = ctx.addMutableState(evaluatorClass, "evaluator",
+  v => s"""$v = $initEvaluator;""", forceInline = true)
+
+val jsonEval = json.genCode(ctx)
+val pathEval = path.genCode(ctx)
+
+val setJson =
+  s"""
+

[spark] branch master updated: [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat

2023-10-11 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eae5c0e1efc [SPARK-45433][SQL] Fix CSV/JSON schema inference when 
timestamps do not match specified timestampFormat
eae5c0e1efc is described below

commit eae5c0e1efce83c2bb08754784db070be285285a
Author: Jia Fan 
AuthorDate: Wed Oct 11 19:33:23 2023 +0300

[SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not 
match specified timestampFormat

### What changes were proposed in this pull request?
This PR fix CSV/JSON schema inference when timestamps do not match 
specified timestampFormat will report error.
```scala
//eg
val csv = spark.read.option("timestampFormat", "-MM-dd'T'HH:mm:ss")
  .option("inferSchema", true).csv(Seq("2884-06-24T02:45:51.138").toDS())
csv.show()
//error
Caused by: java.time.format.DateTimeParseException: Text 
'2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 19
```
This bug only happend when partition had one row. The data type should be 
`StringType` not `TimestampType` because the value not match `timestampFormat`.

Use csv as eg, in `CSVInferSchema::tryParseTimestampNTZ`, first, use 
`timestampNTZFormatter.parseWithoutTimeZoneOptional` to inferring return 
`TimestampType`, if same partition had another row, it will use 
`tryParseTimestamp` to parse row with user defined `timestampFormat`, then 
found it can't be convert to timestamp with `timestampFormat`. Finally return 
`StringType`. But when only one row, we use 
`timestampNTZFormatter.parseWithoutTimeZoneOptional` to parse normally 
timestamp not r [...]

### Why are the changes needed?
Fix schema inference bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43243 from 
Hisoka-X/SPARK-45433-inference-mismatch-timestamp-one-row.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala |  9 ++---
 .../org/apache/spark/sql/catalyst/json/JsonInferSchema.scala   |  8 +---
 .../apache/spark/sql/catalyst/csv/CSVInferSchemaSuite.scala| 10 ++
 .../apache/spark/sql/catalyst/json/JsonInferSchemaSuite.scala  |  8 
 4 files changed, 29 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
index 51586a0065e..ec01b56f9eb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
@@ -27,7 +27,7 @@ import org.apache.spark.sql.catalyst.expressions.ExprUtils
 import org.apache.spark.sql.catalyst.util.{DateFormatter, TimestampFormatter}
 import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT
 import org.apache.spark.sql.errors.QueryExecutionErrors
-import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.internal.{LegacyBehaviorPolicy, SQLConf}
 import org.apache.spark.sql.types._
 
 class CSVInferSchema(val options: CSVOptions) extends Serializable {
@@ -202,8 +202,11 @@ class CSVInferSchema(val options: CSVOptions) extends 
Serializable {
 // We can only parse the value as TimestampNTZType if it does not have 
zone-offset or
 // time-zone component and can be parsed with the timestamp formatter.
 // Otherwise, it is likely to be a timestamp with timezone.
-if (timestampNTZFormatter.parseWithoutTimeZoneOptional(field, 
false).isDefined) {
-  SQLConf.get.timestampType
+val timestampType = SQLConf.get.timestampType
+if ((SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY ||
+timestampType == TimestampNTZType) &&
+timestampNTZFormatter.parseWithoutTimeZoneOptional(field, 
false).isDefined) {
+  timestampType
 } else {
   tryParseTimestamp(field)
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
index 5385afe8c93..4123c5290b6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
@@ -32,7 +32,7 @@ import 
org.apache.spark.sql.catalyst.json.JacksonUtils.nextUntil
 import org.apache.spark.sql.catalyst.util._
 import org.

[spark] branch branch-3.5 updated: [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat

2023-10-11 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 7e3ddc1e582 [SPARK-45433][SQL] Fix CSV/JSON schema inference when 
timestamps do not match specified timestampFormat
7e3ddc1e582 is described below

commit 7e3ddc1e582a6e4fa96bab608c4c2bbc2c93b449
Author: Jia Fan 
AuthorDate: Wed Oct 11 19:33:23 2023 +0300

[SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not 
match specified timestampFormat

### What changes were proposed in this pull request?
This PR fix CSV/JSON schema inference when timestamps do not match 
specified timestampFormat will report error.
```scala
//eg
val csv = spark.read.option("timestampFormat", "-MM-dd'T'HH:mm:ss")
  .option("inferSchema", true).csv(Seq("2884-06-24T02:45:51.138").toDS())
csv.show()
//error
Caused by: java.time.format.DateTimeParseException: Text 
'2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 19
```
This bug only happend when partition had one row. The data type should be 
`StringType` not `TimestampType` because the value not match `timestampFormat`.

Use csv as eg, in `CSVInferSchema::tryParseTimestampNTZ`, first, use 
`timestampNTZFormatter.parseWithoutTimeZoneOptional` to inferring return 
`TimestampType`, if same partition had another row, it will use 
`tryParseTimestamp` to parse row with user defined `timestampFormat`, then 
found it can't be convert to timestamp with `timestampFormat`. Finally return 
`StringType`. But when only one row, we use 
`timestampNTZFormatter.parseWithoutTimeZoneOptional` to parse normally 
timestamp not r [...]

### Why are the changes needed?
Fix schema inference bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43243 from 
Hisoka-X/SPARK-45433-inference-mismatch-timestamp-one-row.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
(cherry picked from commit eae5c0e1efce83c2bb08754784db070be285285a)
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala |  9 ++---
 .../org/apache/spark/sql/catalyst/json/JsonInferSchema.scala   |  8 +---
 .../apache/spark/sql/catalyst/csv/CSVInferSchemaSuite.scala| 10 ++
 .../apache/spark/sql/catalyst/json/JsonInferSchemaSuite.scala  |  8 
 4 files changed, 29 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
index 51586a0065e..ec01b56f9eb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
@@ -27,7 +27,7 @@ import org.apache.spark.sql.catalyst.expressions.ExprUtils
 import org.apache.spark.sql.catalyst.util.{DateFormatter, TimestampFormatter}
 import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT
 import org.apache.spark.sql.errors.QueryExecutionErrors
-import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.internal.{LegacyBehaviorPolicy, SQLConf}
 import org.apache.spark.sql.types._
 
 class CSVInferSchema(val options: CSVOptions) extends Serializable {
@@ -202,8 +202,11 @@ class CSVInferSchema(val options: CSVOptions) extends 
Serializable {
 // We can only parse the value as TimestampNTZType if it does not have 
zone-offset or
 // time-zone component and can be parsed with the timestamp formatter.
 // Otherwise, it is likely to be a timestamp with timezone.
-if (timestampNTZFormatter.parseWithoutTimeZoneOptional(field, 
false).isDefined) {
-  SQLConf.get.timestampType
+val timestampType = SQLConf.get.timestampType
+if ((SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY ||
+timestampType == TimestampNTZType) &&
+timestampNTZFormatter.parseWithoutTimeZoneOptional(field, 
false).isDefined) {
+  timestampType
 } else {
   tryParseTimestamp(field)
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
index 5385afe8c93..4123c5290b6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
@@ -32,7 +32,7 @@ import 
org.apache.spark.sql.catalyst.json.Ja

[spark] branch branch-3.4 updated: [SPARK-45433][SQL][3.4] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat

2023-10-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new f985d716e164 [SPARK-45433][SQL][3.4] Fix CSV/JSON schema inference 
when timestamps do not match specified timestampFormat
f985d716e164 is described below

commit f985d716e164885575ec7f36a7782694411da024
Author: Jia Fan 
AuthorDate: Thu Oct 12 17:09:48 2023 +0500

[SPARK-45433][SQL][3.4] Fix CSV/JSON schema inference when timestamps do 
not match specified timestampFormat

### What changes were proposed in this pull request?
This is a backport PR of #43243. Fix the bug of schema inference when 
timestamps do not match specified timestampFormat. Please check #43243 for 
detail.

### Why are the changes needed?
Fix schema inference bug on 3.4.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?

Closes #43343 from Hisoka-X/backport-SPARK-45433-inference-schema.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala |  8 ++--
 .../org/apache/spark/sql/catalyst/json/JsonInferSchema.scala   |  7 +--
 .../apache/spark/sql/catalyst/csv/CSVInferSchemaSuite.scala| 10 ++
 .../apache/spark/sql/catalyst/json/JsonInferSchemaSuite.scala  |  8 
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
index 51586a0065e9..dd8ac3985f19 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
@@ -28,6 +28,7 @@ import org.apache.spark.sql.catalyst.util.{DateFormatter, 
TimestampFormatter}
 import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT
 import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy
 import org.apache.spark.sql.types._
 
 class CSVInferSchema(val options: CSVOptions) extends Serializable {
@@ -202,8 +203,11 @@ class CSVInferSchema(val options: CSVOptions) extends 
Serializable {
 // We can only parse the value as TimestampNTZType if it does not have 
zone-offset or
 // time-zone component and can be parsed with the timestamp formatter.
 // Otherwise, it is likely to be a timestamp with timezone.
-if (timestampNTZFormatter.parseWithoutTimeZoneOptional(field, 
false).isDefined) {
-  SQLConf.get.timestampType
+val timestampType = SQLConf.get.timestampType
+if ((SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY ||
+timestampType == TimestampNTZType) &&
+timestampNTZFormatter.parseWithoutTimeZoneOptional(field, 
false).isDefined) {
+  timestampType
 } else {
   tryParseTimestamp(field)
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
index 5385afe8c935..7e4767750fd3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
@@ -33,6 +33,7 @@ import org.apache.spark.sql.catalyst.util._
 import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT
 import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy
 import org.apache.spark.sql.types._
 import org.apache.spark.util.Utils
 
@@ -148,11 +149,13 @@ private[sql] class JsonInferSchema(options: JSONOptions) 
extends Serializable {
   val bigDecimal = decimalParser(field)
 DecimalType(bigDecimal.precision, bigDecimal.scale)
 }
+val timestampType = SQLConf.get.timestampType
 if (options.prefersDecimal && decimalTry.isDefined) {
   decimalTry.get
-} else if (options.inferTimestamp &&
+} else if (options.inferTimestamp && 
(SQLConf.get.legacyTimeParserPolicy ==
+  LegacyBehaviorPolicy.LEGACY || timestampType == TimestampNTZType) &&
 timestampNTZFormatter.parseWithoutTimeZoneOptional(field, 
false).isDefined) {
-  SQLConf.get.timestampType
+  timestampType
 } else if (options.inferTimestamp &&
 timestampFormatter.parseOptional(field).isDefined) {

[spark] branch master updated: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-10-16 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6994bad5e6e [SPARK-44262][SQL] Add `dropTable` and 
`getInsertStatement` to JdbcDialect
6994bad5e6e is described below

commit 6994bad5e6ea8700d48cbe20e9b406b89925adc7
Author: Jia Fan 
AuthorDate: Mon Oct 16 13:55:24 2023 +0500

[SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

### What changes were proposed in this pull request?
1. This PR add `dropTable` function to `JdbcDialect`. So user can override 
dropTable SQL by other JdbcDialect like Neo4J
Neo4J Drop case
```sql
MATCH (m:Person {name: 'Mark'})
DELETE m
```
2. Also add `getInsertStatement` for same reason.
Neo4J Insert case
```sql
MATCH (p:Person {name: 'Jennifer'})
SET p.birthdate = date('1980-01-01')
RETURN p
```
Neo4J SQL(in fact named `CQL`) not like normal SQL, but it have JDBC driver.

### Why are the changes needed?
Make JdbcDialect more useful

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
exist test

Closes #41855 from Hisoka-X/SPARK-44262_JDBCUtils_improve.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 14 +--
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   | 29 ++
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index fb9e11df188..f2b84810175 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -78,7 +78,8 @@ object JdbcUtils extends Logging with SQLConfHelper {
* Drops a table from the JDBC database.
*/
   def dropTable(conn: Connection, table: String, options: JDBCOptions): Unit = 
{
-executeStatement(conn, options, s"DROP TABLE $table")
+val dialect = JdbcDialects.get(options.url)
+executeStatement(conn, options, dialect.dropTable(table))
   }
 
   /**
@@ -114,22 +115,19 @@ object JdbcUtils extends Logging with SQLConfHelper {
   isCaseSensitive: Boolean,
   dialect: JdbcDialect): String = {
 val columns = if (tableSchema.isEmpty) {
-  rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",")
+  rddSchema.fields
 } else {
   // The generated insert statement needs to follow rddSchema's column 
sequence and
   // tableSchema's column names. When appending data into some 
case-sensitive DBMSs like
   // PostgreSQL/Oracle, we need to respect the existing case-sensitive 
column names instead of
   // RDD column names for user convenience.
-  val tableColumnNames = tableSchema.get.fieldNames
   rddSchema.fields.map { col =>
-val normalizedName = tableColumnNames.find(f => conf.resolver(f, 
col.name)).getOrElse {
+tableSchema.get.find(f => conf.resolver(f.name, col.name)).getOrElse {
   throw QueryCompilationErrors.columnNotFoundInSchemaError(col, 
tableSchema)
 }
-dialect.quoteIdentifier(normalizedName)
-  }.mkString(",")
+  }
 }
-val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
-s"INSERT INTO $table ($columns) VALUES ($placeholders)"
+dialect.insertIntoTable(table, columns)
   }
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
index 22625523a04..37c378c294c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
@@ -193,6 +193,24 @@ abstract class JdbcDialect extends Serializable with 
Logging {
 statement.executeUpdate(s"CREATE TABLE $tableName ($strSchema) 
$createTableOptions")
   }
 
+  /**
+   * Returns an Insert SQL statement template for inserting a row into the 
target table via JDBC
+   * conn. Use "?" as placeholder for each value to be inserted.
+   * E.g. `INSERT INTO t ("name", "age", "gender") VALUES (?, ?, ?)`
+   *
+   * @param table The name of the table.
+   * @param fields The fields of the row that will be inserted.
+   * @return The SQL query to use for insert data into table.
+   */
+  @Since("4.0.0")
+  def insertIntoTable(
+  table: String,
+  fields: Array[StructField]): String = {
+

[spark] branch master updated: [SPARK-45035][SQL] Fix ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will report error

2023-10-17 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 11e7ea4f11d [SPARK-45035][SQL] Fix 
ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will report error
11e7ea4f11d is described below

commit 11e7ea4f11df71e2942322b01fcaab57dac20c83
Author: Jia Fan 
AuthorDate: Wed Oct 18 11:06:43 2023 +0500

[SPARK-45035][SQL] Fix ignoreCorruptFiles/ignoreMissingFiles with multiline 
CSV/JSON will report error

### What changes were proposed in this pull request?
Fix ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will 
report error, it would be like:
```log
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 4940.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
4940.0 (TID 4031) (10.68.177.106 executor 0): 
com.univocity.parsers.common.TextParsingException: 
java.lang.IllegalStateException - Error reading from input
Parser Configuration: CsvParserSettings:
Auto configuration enabled=true
Auto-closing enabled=true
Autodetect column delimiter=false
Autodetect quotes=false
Column reordering enabled=true
Delimiters for detection=null
Empty value=
Escape unquoted values=false
Header extraction enabled=null
Headers=null
Ignore leading whitespaces=false
Ignore leading whitespaces in quotes=false
Ignore trailing whitespaces=false
Ignore trailing whitespaces in quotes=false
Input buffer size=1048576
Input reading on separate thread=false
Keep escape sequences=false
Keep quotes=false
Length of content displayed on error=1000
Line separator detection enabled=true
Maximum number of characters per column=-1
Maximum number of columns=20480
Normalize escaped line separators=true
Null value=
Number of records to read=all
Processor=none
Restricting data in exceptions=false
RowProcessor error handler=null
Selected fields=none
Skip bits as whitespace=true
Skip empty lines=true
Unescaped quote handling=STOP_AT_DELIMITERFormat configuration:
CsvFormat:
Comment character=#
Field delimiter=,
Line separator (normalized)=\n
Line separator sequence=\n
Quote character="
Quote escape character=\
Quote escape escape character=null
Internal state when error was thrown: line=0, column=0, record=0
at 
com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:402)
at 
com.univocity.parsers.common.AbstractParser.beginParsing(AbstractParser.java:277)
at 
com.univocity.parsers.common.AbstractParser.beginParsing(AbstractParser.java:843)
at 
org.apache.spark.sql.catalyst.csv.UnivocityParser$$anon$1.(UnivocityParser.scala:463)
at 
org.apache.spark.sql.catalyst.csv.UnivocityParser$.convertStream(UnivocityParser.scala:46...
```
Because multiline CSV/JSON use `BinaryFileRDD` not `FileScanRDD`. Unlike 
`FileScanRDD`, when met corrupt files will check `ignoreCorruptFiles` config to 
avoid report IOException, `BinaryFileRDD` will not report error because it 
return normal `PortableDataStream`. So we should catch it when infer schema in 
lambda function. Also do same thing for `ignoreMissingFiles`.

### Why are the changes needed?
Fix the bug when use mulitline mode with 
ignoreCorruptFiles/ignoreMissingFiles config.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42979 from Hisoka-X/SPARK-45035_csv_multi_line.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/json/JsonInferSchema.scala  | 18 +--
 .../execution/datasources/csv/CSVDataSource.scala  | 28 ---
 .../datasources/CommonFileDataSourceSuite.scala| 28 +++
 .../sql/execution/datasources/csv/CSVSuite.scala   | 58 +-
 .../sql/execution/datasources/json/JsonSuite.scala | 46 -
 5 files changed, 142 insertions(+), 36 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
index 4123c5290b6..4d04b34876c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spar

[spark] branch master updated: [SPARK-35926][SQL] Add support YearMonthIntervalType for width_bucket

2021-10-15 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d061e3  [SPARK-35926][SQL] Add support YearMonthIntervalType for 
width_bucket
9d061e3 is described below

commit 9d061e3939a021c602c070fc13cef951a8f94c82
Author: PengLei 
AuthorDate: Fri Oct 15 17:15:50 2021 +0300

[SPARK-35926][SQL] Add support YearMonthIntervalType for width_bucket

### What changes were proposed in this pull request?
Support width_bucket(YearMonthIntervalType, YearMonthIntervalType, 
YearMonthIntervalType, Long), it return long result
 eg:
```
width_bucket(input_value, min_value, max_value, bucket_nums)
width_bucket(INTERVAL '1' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10)
It will divides the range between the max_value and min_value into 10 
buckets.
[ INTERVAL '0' YEAR,  INTERVAL '1' YEAR),  [ INTERVAL '1' YEAR,  INTERVAL 
'2' YEAR)..  [INTERVAL '9' YEAR,  INTERVAL '10' YEAR)
Then, calculates which bucket the given input_value locate.
```

The function `width_bucket` is introduced from 
[SPARK-21117](https://issues.apache.org/jira/browse/SPARK-21117)
### Why are the changes needed?
[35926](https://issues.apache.org/jira/browse/SPARK-35926)
1. The `WIDTH_BUCKET` function assigns values to buckets (individual 
segments) in an equiwidth histogram. The ANSI SQL Standard Syntax is like 
follow: `WIDTH_BUCKET( expression, min, max, buckets)`. 
[Reference](https://www.oreilly.com/library/view/sql-in-a/9780596155322/re91.html).
2. `WIDTH_BUCKET` just support `Double` at now, Of course, we can cast 
`Int` to `Double` to use it. But we cloud not cast `YearMonthIntervayType` to 
`Double`.
3. I think it has a use scenario. eg:  Histogram of employee years of 
service, the `years of service` is a column of `YearMonthIntervalType` dataType.

### Does this PR introduce _any_ user-facing change?
Yes. The user can use `width_bucket` with YearMonthIntervalType.

### How was this patch tested?
Add ut test

Closes #33132 from Peng-Lei/SPARK-35926.

Authored-by: PengLei 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/expressions/mathExpressions.scala | 33 ++
 .../expressions/MathExpressionsSuite.scala | 15 ++
 .../test/resources/sql-tests/inputs/interval.sql   |  2 ++
 .../sql-tests/results/ansi/interval.sql.out| 18 +++-
 .../resources/sql-tests/results/interval.sql.out   | 18 +++-
 .../org/apache/spark/sql/MathFunctionsSuite.scala  | 17 +++
 6 files changed, 96 insertions(+), 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index c14fa72..6c34ed6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -25,7 +25,7 @@ import 
org.apache.spark.sql.catalyst.analysis.{FunctionRegistry, TypeCheckResult
 import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.expressions.codegen.Block._
-import org.apache.spark.sql.catalyst.util.NumberConverter
+import org.apache.spark.sql.catalyst.util.{NumberConverter, TypeUtils}
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.UTF8String
 
@@ -1613,6 +1613,10 @@ object WidthBucket {
5
   > SELECT _FUNC_(-0.9, 5.2, 0.5, 2);
3
+  > SELECT _FUNC_(INTERVAL '0' YEAR, INTERVAL '0' YEAR, INTERVAL '10' 
YEAR, 10);
+   1
+  > SELECT _FUNC_(INTERVAL '1' YEAR, INTERVAL '0' YEAR, INTERVAL '10' 
YEAR, 10);
+   2
   """,
   since = "3.1.0",
   group = "math_funcs")
@@ -1623,16 +1627,35 @@ case class WidthBucket(
 numBucket: Expression)
   extends QuaternaryExpression with ImplicitCastInputTypes with NullIntolerant 
{
 
-  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, 
DoubleType, LongType)
+  override def inputTypes: Seq[AbstractDataType] = Seq(
+TypeCollection(DoubleType, YearMonthIntervalType),
+TypeCollection(DoubleType, YearMonthIntervalType),
+TypeCollection(DoubleType, YearMonthIntervalType),
+LongType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+super.checkInputDataTypes() match {
+  case TypeCheckSuccess =>
+(value.dataType, minValue.dataType, ma

[spark] branch master updated (c29bb02 -> 21fa3ce)

2021-10-18 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c29bb02  [SPARK-36965][PYTHON] Extend python test runner by logging 
out the temp output files
 add 21fa3ce  [SPARK-35925][SQL] Support DayTimeIntervalType in 
width-bucket function

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/mathExpressions.scala   | 12 +---
 .../catalyst/expressions/MathExpressionsSuite.scala  | 20 
 .../src/test/resources/sql-tests/inputs/interval.sql |  2 ++
 .../sql-tests/results/ansi/interval.sql.out  | 18 +-
 .../resources/sql-tests/results/interval.sql.out | 18 +-
 5 files changed, 65 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36928][SQL] Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray

2021-10-28 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fd8d5ad  [SPARK-36928][SQL] Handle ANSI intervals in ColumnarRow, 
ColumnarBatchRow and ColumnarArray
fd8d5ad is described below

commit fd8d5ad2140d6405357b908dce2d00a21036dedb
Author: PengLei 
AuthorDate: Thu Oct 28 14:52:41 2021 +0300

[SPARK-36928][SQL] Handle ANSI intervals in ColumnarRow, ColumnarBatchRow 
and ColumnarArray

### What changes were proposed in this pull request?
1. add handle ansi interval type for `get`, `copy` method of ColumnarArray
2. add handle ansi interval type for `get`, `copy` method of 
ColumnarBatchRow
3.  add handle ansi interval type for `get`, `copy` method of ColumnarRow

### Why are the changes needed?
[SPARK-36928](https://issues.apache.org/jira/browse/SPARK-36928)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add test case

Closes #34421 from Peng-Lei/SPARK-36928.

Authored-by: PengLei 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/vectorized/ColumnarArray.java |  6 +-
 .../spark/sql/vectorized/ColumnarBatchRow.java |  8 +--
 .../apache/spark/sql/vectorized/ColumnarRow.java   |  8 +--
 .../execution/vectorized/ColumnVectorSuite.scala   | 69 ++
 .../execution/vectorized/ColumnarBatchSuite.scala  | 32 ++
 5 files changed, 113 insertions(+), 10 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java
index 147dd24..2fb6b3f 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java
@@ -57,9 +57,11 @@ public final class ColumnarArray extends ArrayData {
   return UnsafeArrayData.fromPrimitiveArray(toByteArray());
 } else if (dt instanceof ShortType) {
   return UnsafeArrayData.fromPrimitiveArray(toShortArray());
-} else if (dt instanceof IntegerType || dt instanceof DateType) {
+} else if (dt instanceof IntegerType || dt instanceof DateType
+|| dt instanceof YearMonthIntervalType) {
   return UnsafeArrayData.fromPrimitiveArray(toIntArray());
-} else if (dt instanceof LongType || dt instanceof TimestampType) {
+} else if (dt instanceof LongType || dt instanceof TimestampType
+|| dt instanceof DayTimeIntervalType) {
   return UnsafeArrayData.fromPrimitiveArray(toLongArray());
 } else if (dt instanceof FloatType) {
   return UnsafeArrayData.fromPrimitiveArray(toFloatArray());
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java
index c6b7287e7..8c32d5c 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java
@@ -52,9 +52,9 @@ public final class ColumnarBatchRow extends InternalRow {
   row.setByte(i, getByte(i));
 } else if (dt instanceof ShortType) {
   row.setShort(i, getShort(i));
-} else if (dt instanceof IntegerType) {
+} else if (dt instanceof IntegerType || dt instanceof 
YearMonthIntervalType) {
   row.setInt(i, getInt(i));
-} else if (dt instanceof LongType) {
+} else if (dt instanceof LongType || dt instanceof 
DayTimeIntervalType) {
   row.setLong(i, getLong(i));
 } else if (dt instanceof FloatType) {
   row.setFloat(i, getFloat(i));
@@ -151,9 +151,9 @@ public final class ColumnarBatchRow extends InternalRow {
   return getByte(ordinal);
 } else if (dataType instanceof ShortType) {
   return getShort(ordinal);
-} else if (dataType instanceof IntegerType) {
+} else if (dataType instanceof IntegerType || dataType instanceof 
YearMonthIntervalType) {
   return getInt(ordinal);
-} else if (dataType instanceof LongType) {
+} else if (dataType instanceof LongType || dataType instanceof 
DayTimeIntervalType) {
   return getLong(ordinal);
 } else if (dataType instanceof FloatType) {
   return getFloat(ordinal);
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java
index 4b9d3c5..da4b242 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java
@@ -61,9 +61,9 @@ public final class ColumnarRow extends InternalRow {
   row.setByte(i, getByte(i

[spark] branch master updated: [SPARK-37138][SQL] Support ANSI Interval types in ApproxCountDistinctForIntervals/ApproximatePercentile/Percentile

2021-10-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 08123a3  [SPARK-37138][SQL] Support ANSI Interval types in 
ApproxCountDistinctForIntervals/ApproximatePercentile/Percentile
08123a3 is described below

commit 08123a3795683238352e5bf55452de381349fdd9
Author: Angerszh 
AuthorDate: Sat Oct 30 20:03:20 2021 +0300

[SPARK-37138][SQL] Support ANSI Interval types in 
ApproxCountDistinctForIntervals/ApproximatePercentile/Percentile

### What changes were proposed in this pull request?

Support Ansi Interval types in the agg expressions:
- ApproxCountDistinctForIntervals
- ApproximatePercentile
- Percentile

### Why are the changes needed?
To improve user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new UT.

Closes #34412 from AngersZh/SPARK-37138.

Authored-by: Angerszh 
Signed-off-by: Max Gekk 
---
 .../ApproxCountDistinctForIntervals.scala  | 13 +++---
 .../aggregate/ApproximatePercentile.scala  | 32 --
 .../expressions/aggregate/Percentile.scala | 26 +---
 .../ApproxCountDistinctForIntervalsSuite.scala |  6 ++-
 .../expressions/aggregate/PercentileSuite.scala|  8 ++--
 ...ApproxCountDistinctForIntervalsQuerySuite.scala | 28 +
 .../sql/ApproximatePercentileQuerySuite.scala  | 22 +-
 .../apache/spark/sql/PercentileQuerySuite.scala| 49 ++
 8 files changed, 153 insertions(+), 31 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala
index a7e9a22..f3bf251 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala
@@ -61,7 +61,8 @@ case class ApproxCountDistinctForIntervals(
   }
 
   override def inputTypes: Seq[AbstractDataType] = {
-Seq(TypeCollection(NumericType, TimestampType, DateType, 
TimestampNTZType), ArrayType)
+Seq(TypeCollection(NumericType, TimestampType, DateType, TimestampNTZType,
+  YearMonthIntervalType, DayTimeIntervalType), ArrayType)
   }
 
   // Mark as lazy so that endpointsExpression is not evaluated during tree 
transformation.
@@ -79,14 +80,16 @@ case class ApproxCountDistinctForIntervals(
   TypeCheckFailure("The endpoints provided must be constant literals")
 } else {
   endpointsExpression.dataType match {
-case ArrayType(_: NumericType | DateType | TimestampType | 
TimestampNTZType, _) =>
+case ArrayType(_: NumericType | DateType | TimestampType | 
TimestampNTZType |
+   _: AnsiIntervalType, _) =>
   if (endpoints.length < 2) {
 TypeCheckFailure("The number of endpoints must be >= 2 to 
construct intervals")
   } else {
 TypeCheckSuccess
   }
 case _ =>
-  TypeCheckFailure("Endpoints require (numeric or timestamp or date) 
type")
+  TypeCheckFailure("Endpoints require (numeric or timestamp or date or 
timestamp_ntz or " +
+"interval year to month or interval day to second) type")
   }
 }
   }
@@ -120,9 +123,9 @@ case class ApproxCountDistinctForIntervals(
   val doubleValue = child.dataType match {
 case n: NumericType =>
   n.numeric.toDouble(value.asInstanceOf[n.InternalType])
-case _: DateType =>
+case _: DateType | _: YearMonthIntervalType =>
   value.asInstanceOf[Int].toDouble
-case TimestampType | TimestampNTZType =>
+case TimestampType | TimestampNTZType | _: DayTimeIntervalType =>
   value.asInstanceOf[Long].toDouble
   }
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
index 8cce79c..0dcb906 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
@@ -49,15 +49,16 @@ import org.apache.spark.sql.types._
  *   yields better accuracy, the default value is
  *   DEFAULT_PERCENTILE_ACCURACY.
  */
+// scalas

[spark] branch master updated (13c372d -> d43a678)

2021-10-31 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 13c372d  [SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 
command by default
 add d43a678  [SPARK-37161][SQL] RowToColumnConverter support 
AnsiIntervalType

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/Columnar.scala  |  4 +--
 .../execution/vectorized/ColumnarBatchSuite.scala  | 37 ++
 2 files changed, 39 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37176][SQL] Sync JsonInferSchema#infer method's exception handle logic with JacksonParser#parse method

2021-11-02 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ec6a3ae  [SPARK-37176][SQL] Sync JsonInferSchema#infer method's 
exception handle logic with JacksonParser#parse method
ec6a3ae is described below

commit ec6a3ae6dff1dc9c63978ae14a1793ccd771
Author: Xianjin YE 
AuthorDate: Tue Nov 2 12:40:09 2021 +0300

[SPARK-37176][SQL] Sync JsonInferSchema#infer method's exception handle 
logic with JacksonParser#parse method

### What changes were proposed in this pull request?
Change `JsonInferSchema#infer`'s exception handle logic to be aligned with 
`JacksonParser#parse`

### Why are the changes needed?
To reduce behavior inconsistency, users can have the same expectation for 
schema infer and json parse when dealing with some malformed input.

### Does this PR introduce _any_ user-facing change?
Yes.
Before this patch, json's inferring schema could be failed for some 
malformed input but succeeded when parsing.
After this patch, they have the same exception handle logic.

### How was this patch tested?
Added one new test and modify one exist test to cover the new case.

Closes #34455 from advancedxy/SPARK-37176.

Authored-by: Xianjin YE 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/json/JsonInferSchema.scala  | 33 +++-
 .../test/resources/test-data/malformed_utf8.json   |  3 ++
 .../sql/execution/datasources/json/JsonSuite.scala | 35 ++
 3 files changed, 63 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
index 3b17cde..3b62b16 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.json
 
+import java.io.CharConversionException
+import java.nio.charset.MalformedInputException
 import java.util.Comparator
 
 import scala.util.control.Exception.allCatch
@@ -45,6 +47,18 @@ private[sql] class JsonInferSchema(options: JSONOptions) 
extends Serializable {
 legacyFormat = FAST_DATE_FORMAT,
 isParsing = true)
 
+  private def handleJsonErrorsByParseMode(parseMode: ParseMode,
+  columnNameOfCorruptRecord: String, e: Throwable): Option[StructType] = {
+parseMode match {
+  case PermissiveMode =>
+Some(StructType(Seq(StructField(columnNameOfCorruptRecord, 
StringType
+  case DropMalformedMode =>
+None
+  case FailFastMode =>
+throw 
QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e)
+}
+  }
+
   /**
* Infer the type of a collection of json records in three stages:
*   1. Infer the type of each record
@@ -68,14 +82,17 @@ private[sql] class JsonInferSchema(options: JSONOptions) 
extends Serializable {
 Some(inferField(parser))
   }
 } catch {
-  case  e @ (_: RuntimeException | _: JsonProcessingException) => 
parseMode match {
-case PermissiveMode =>
-  Some(StructType(Seq(StructField(columnNameOfCorruptRecord, 
StringType
-case DropMalformedMode =>
-  None
-case FailFastMode =>
-  throw 
QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e)
-  }
+  case e @ (_: RuntimeException | _: JsonProcessingException |
+_: MalformedInputException) =>
+handleJsonErrorsByParseMode(parseMode, columnNameOfCorruptRecord, 
e)
+  case e: CharConversionException if options.encoding.isEmpty =>
+val msg =
+  """JSON parser cannot handle a character in its input.
+|Specifying encoding as an input option explicitly might help 
to resolve the issue.
+|""".stripMargin + e.getMessage
+val wrappedCharException = new CharConversionException(msg)
+wrappedCharException.initCause(e)
+handleJsonErrorsByParseMode(parseMode, columnNameOfCorruptRecord, 
wrappedCharException)
 }
   }.reduceOption(typeMerger).toIterator
 }
diff --git a/sql/core/src/test/resources/test-data/malformed_utf8.json 
b/sql/core/src/test/resources/test-data/malformed_utf8.json
new file mode 100644
index 000..c57eb43
--- /dev/null
+++ b/sql/core/src/test/resources/test-data/malformed_utf8.json
@@ -0,0 +1,3 @@
+{"a": 1}
+{"a": 1}
+�
\ No newline at end of file
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/dataso

[spark] branch master updated: [SPARK-24774][SQL][FOLLOWUP] Remove unused code in SchemaConverters.scala

2021-11-02 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 59c55dd  [SPARK-24774][SQL][FOLLOWUP] Remove unused code in 
SchemaConverters.scala
59c55dd is described below

commit 59c55dd4c6f7772ef7949653679a2b76211788e8
Author: Gengliang Wang 
AuthorDate: Wed Nov 3 08:43:25 2021 +0300

[SPARK-24774][SQL][FOLLOWUP] Remove unused code in SchemaConverters.scala

### What changes were proposed in this pull request?

As MaxGekk pointed out in 
https://github.com/apache/spark/pull/22037/files#r741373793, there is some 
unused code in SchemaConverters.scala.  The UUID generator was for generating 
`fix` avro field names but we figure out a better solution during PR review.
This PR is to remove the dead code.

### Why are the changes needed?

Code clean up

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT.

Closes #34472 from gengliangwang/SPARK-24774-followup.

Authored-by: Gengliang Wang 
Signed-off-by: Max Gekk 
---
 .../src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala   | 4 
 1 file changed, 4 deletions(-)

diff --git 
a/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala 
b/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
index 1c9b06b..347364c 100644
--- 
a/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
+++ 
b/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
@@ -18,14 +18,12 @@
 package org.apache.spark.sql.avro
 
 import scala.collection.JavaConverters._
-import scala.util.Random
 
 import org.apache.avro.{LogicalTypes, Schema, SchemaBuilder}
 import org.apache.avro.LogicalTypes.{Date, Decimal, LocalTimestampMicros, 
LocalTimestampMillis, TimestampMicros, TimestampMillis}
 import org.apache.avro.Schema.Type._
 
 import org.apache.spark.annotation.DeveloperApi
-import org.apache.spark.sql.catalyst.util.RandomUUIDGenerator
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.types.Decimal.minBytesForPrecision
 
@@ -35,8 +33,6 @@ import org.apache.spark.sql.types.Decimal.minBytesForPrecision
  */
 @DeveloperApi
 object SchemaConverters {
-  private lazy val uuidGenerator = RandomUUIDGenerator(new Random().nextLong())
-
   private lazy val nullSchema = Schema.create(Schema.Type.NULL)
 
   /**

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37261][SQL] Allow adding partitions with ANSI intervals in DSv2

2021-11-10 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a1267a  [SPARK-37261][SQL] Allow adding partitions with ANSI 
intervals in DSv2
2a1267a is described below

commit 2a1267aeb75bf838c74d1cf274aa258be060c17b
Author: Max Gekk 
AuthorDate: Wed Nov 10 15:21:33 2021 +0300

[SPARK-37261][SQL] Allow adding partitions with ANSI intervals in DSv2

### What changes were proposed in this pull request?
In the PR, I propose to skip checking of ANSI interval types while creating 
or writing to a table using V2 catalogs. As the consequence of that, users can 
creating tables in V2 catalogs partitioned by ANSI interval columns (the legacy 
intervals of `CalendarIntervalType` are still prohibited). Also this PR adds 
new test which checks:
1. Adding new partition with ANSI intervals via `ALTER TABLE .. ADD 
PARTITION`
2. INSERT INTO a table partitioned by ANSI intervals

for V1/V2 In-Memory catalogs (skips V1 Hive external catalog).

### Why are the changes needed?
To allow users saving of ANSI intervals as partition values using DSv2.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test for V1/V2 In-Memory and V1 Hive external catalogs:
```
$ build/sbt "test:testOnly 
org.apache.spark.sql.execution.command.v1.AlterTableAddPartitionSuite"
$ build/sbt "test:testOnly 
org.apache.spark.sql.execution.command.v2.AlterTableAddPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly 
org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly 
*DataSourceV2SQLSuite"
    ```

Closes #34537 from MaxGekk/alter-table-ansi-interval.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  6 ++--
 .../apache/spark/sql/catalyst/util/TypeUtils.scala |  4 +--
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 16 +
 .../command/AlterTableAddPartitionSuiteBase.scala  | 40 +-
 4 files changed, 54 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 5bf37a2..1a105ad 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -464,10 +464,12 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
   failAnalysis(s"Invalid partitioning: ${badReferences.mkString(", 
")}")
 }
 
-create.tableSchema.foreach(f => 
TypeUtils.failWithIntervalType(f.dataType))
+create.tableSchema.foreach(f =>
+  TypeUtils.failWithIntervalType(f.dataType, forbidAnsiIntervals = 
false))
 
   case write: V2WriteCommand if write.resolved =>
-write.query.schema.foreach(f => 
TypeUtils.failWithIntervalType(f.dataType))
+write.query.schema.foreach(f =>
+  TypeUtils.failWithIntervalType(f.dataType, forbidAnsiIntervals = 
false))
 
   case alter: AlterTableCommand =>
 checkAlterTableCommand(alter)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala
index cba3a9a..144508c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala
@@ -98,8 +98,8 @@ object TypeUtils {
 case _ => false
   }
 
-  def failWithIntervalType(dataType: DataType): Unit = {
-invokeOnceForInterval(dataType, forbidAnsiIntervals = true) {
+  def failWithIntervalType(dataType: DataType, forbidAnsiIntervals: Boolean = 
true): Unit = {
+invokeOnceForInterval(dataType, forbidAnsiIntervals) {
   throw QueryCompilationErrors.cannotUseIntervalTypeInTableSchemaError()
 }
   }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
index f03792f..499638c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
@@ -340,13 +340,15 @@ class DataSourceV2SQLSuite
   }
 
   test("CTAS/RTAS: invalid schema if has interval type") {
-Seq("CREATE"

[spark] branch master updated (9191632 -> a4b8a8d)

2021-11-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9191632  [SPARK-36825][FOLLOWUP] Move the test code from 
`ParquetIOSuite` to `ParquetFileFormatSuite`
 add a4b8a8d  [SPARK-37294][SQL][TESTS] Check inserting of ANSI intervals 
into a table partitioned by the interval columns

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 35 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 34 +
 2 files changed, 68 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. REPLACE COLUMNS`

2021-11-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 71f4ee3  [SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. 
REPLACE COLUMNS`
71f4ee3 is described below

commit 71f4ee38c71734128c5653b8f18a7d0bf1014b6b
Author: Max Gekk 
AuthorDate: Fri Nov 12 17:23:36 2021 +0300

[SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. REPLACE 
COLUMNS`

### What changes were proposed in this pull request?
In the PR, I propose to allow ANSI intervals: year-month and day-time 
intervals in the `ALTER TABLE .. REPLACE COLUMNS` command for tables in v2 
catalogs (v1 catalogs don't support the command). Also added unified test suite 
to migrate related tests in the future.

### Why are the changes needed?
To improve user experience with Spark SQL. After the changes, users can 
replace columns with ANSI intervals instead of removing and adding such columns.

### Does this PR introduce _any_ user-facing change?
In some sense, yes. After the changes, the command doesn't output any error 
message.

### How was this patch tested?
By running new test suite:
```
$ build/sbt "test:testOnly *AlterTableReplaceColumnsSuite"
```

    Closes #34571 from MaxGekk/add-replace-ansi-interval-col.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../plans/logical/v2AlterTableCommands.scala   |  2 +-
 .../AlterTableReplaceColumnsSuiteBase.scala| 54 ++
 .../command/v2/AlterTableReplaceColumnsSuite.scala | 28 +++
 3 files changed, 83 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
index 302a810..2eb828e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
@@ -134,7 +134,7 @@ case class ReplaceColumns(
 table: LogicalPlan,
 columnsToAdd: Seq[QualifiedColType]) extends AlterTableCommand {
   columnsToAdd.foreach { c =>
-TypeUtils.failWithIntervalType(c.dataType)
+TypeUtils.failWithIntervalType(c.dataType, forbidAnsiIntervals = false)
   }
 
   override lazy val resolved: Boolean = table.resolved && 
columnsToAdd.forall(_.resolved)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableReplaceColumnsSuiteBase.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableReplaceColumnsSuiteBase.scala
new file mode 100644
index 000..fed4076
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableReplaceColumnsSuiteBase.scala
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import java.time.{Duration, Period}
+
+import org.apache.spark.sql.{QueryTest, Row}
+
+/**
+ * This base suite contains unified tests for the `ALTER TABLE .. REPLACE 
COLUMNS` command that
+ * check the V2 table catalog. The tests that cannot run for all supported 
catalogs are
+ * located in more specific test suites:
+ *
+ *   - V2 table catalog tests:
+ * 
`org.apache.spark.sql.execution.command.v2.AlterTableReplaceColumnsSuite`
+ */
+trait AlterTableReplaceColumnsSuiteBase extends QueryTest with 
DDLCommandTestUtils {
+  override val command = "ALTER TABLE .. REPLACE COLUMNS"
+
+  test("SPARK-37304: Replace columns by ANSI intervals") {
+withNamespaceAndTable("ns", "tbl") { t =>
+  sql(s"CREATE TABLE $t (ym INTERVAL MONTH, dt INTERVAL HOUR, data STRING) 
$defaultUsing")
+  // TODO(SPARK-37303): Uncomment the command below after REPLACE COLUMNS 
is fixed
+  // sql(s"INSERT INTO $t SELECT INTERVAL '1' MONTH, INTERVAL &

[spark] branch master updated: [SPARK-37332][SQL] Allow ANSI intervals in `ALTER TABLE .. ADD COLUMNS`

2021-11-15 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0f20678  [SPARK-37332][SQL] Allow ANSI intervals in `ALTER TABLE .. 
ADD COLUMNS`
0f20678 is described below

commit 0f20678fc50aaf26359d9751fe96b15dc2e12540
Author: Max Gekk 
AuthorDate: Tue Nov 16 10:30:11 2021 +0300

[SPARK-37332][SQL] Allow ANSI intervals in `ALTER TABLE .. ADD COLUMNS`

### What changes were proposed in this pull request?
In the PR, I propose to allow ANSI intervals: year-month and day-time 
intervals in the `ALTER TABLE .. ADD COLUMNS` command for tables in v1 and v2 
In-Memory catalogs. Also added an unified test suite to migrate related tests 
in the future.

### Why are the changes needed?
To improve user experience with Spark SQL. After the changes, users will be 
able to add columns with ANSI intervals instead of dropping and creating new 
table.

### Does this PR introduce _any_ user-facing change?
In some sense, yes. After the changes, the command doesn't output any error 
message.

### How was this patch tested?
By running new test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly 
*AlterTableAddColumnsSuite"
$ build/sbt -Phive-2.3 "test:testOnly *HiveDDLSuite"
```
    
Closes #34600 from MaxGekk/add-columns-ansi-intervals.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  6 +--
 .../plans/logical/v2AlterTableCommands.scala   |  2 +-
 .../apache/spark/sql/catalyst/util/TypeUtils.scala |  4 +-
 .../command/AlterTableAddColumnsSuiteBase.scala| 53 ++
 .../command/v1/AlterTableAddColumnsSuite.scala | 38 
 .../command/v2/AlterTableAddColumnsSuite.scala | 28 
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  5 --
 .../command/AlterTableAddColumnsSuite.scala| 46 +++
 8 files changed, 170 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 1a105ad..5bf37a2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -464,12 +464,10 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
   failAnalysis(s"Invalid partitioning: ${badReferences.mkString(", 
")}")
 }
 
-create.tableSchema.foreach(f =>
-  TypeUtils.failWithIntervalType(f.dataType, forbidAnsiIntervals = 
false))
+create.tableSchema.foreach(f => 
TypeUtils.failWithIntervalType(f.dataType))
 
   case write: V2WriteCommand if write.resolved =>
-write.query.schema.foreach(f =>
-  TypeUtils.failWithIntervalType(f.dataType, forbidAnsiIntervals = 
false))
+write.query.schema.foreach(f => 
TypeUtils.failWithIntervalType(f.dataType))
 
   case alter: AlterTableCommand =>
 checkAlterTableCommand(alter)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
index 2eb828e..302a810 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
@@ -134,7 +134,7 @@ case class ReplaceColumns(
 table: LogicalPlan,
 columnsToAdd: Seq[QualifiedColType]) extends AlterTableCommand {
   columnsToAdd.foreach { c =>
-TypeUtils.failWithIntervalType(c.dataType, forbidAnsiIntervals = false)
+TypeUtils.failWithIntervalType(c.dataType)
   }
 
   override lazy val resolved: Boolean = table.resolved && 
columnsToAdd.forall(_.resolved)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala
index 144508c..729a26b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala
@@ -98,8 +98,8 @@ object TypeUtils {
 case _ => false
   }
 
-  def failWithIntervalType(dataType: DataType, forbidAnsiIntervals: Boolean = 
true): Unit = {
-invokeOnceForInterval(dataType, forbidAnsiIntervals) {
+  def failWithIntervalType(dataType: DataType): Unit = {
+

[spark] branch master updated (a6ca481 -> 7484c1b)

2021-11-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a6ca481  [SPARK-36346][SQL][FOLLOWUP] Rename `withAllOrcReaders` to 
`withAllNativeOrcReaders`
 add 7484c1b  [SPARK-37468][SQL] Support ANSI intervals and TimestampNTZ 
for UnionEstimation

No new revisions were added by this update.

Summary of changes:
 .../logical/statsEstimation/UnionEstimation.scala  |  8 +++-
 .../statsEstimation/UnionEstimationSuite.scala | 24 +++---
 2 files changed, 28 insertions(+), 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f7be024 -> ce1f97f)

2021-11-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f7be024  [SPARK-37480][K8S][DOC] Sync Kubernetes configuration to 
latest in running-on-k8s.md
 add ce1f97f  [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-csv.md   |  12 +-
 .../spark/sql/catalyst/csv/CSVInferSchema.scala|  24 +++
 .../apache/spark/sql/catalyst/csv/CSVOptions.scala |   4 +
 .../sql/catalyst/csv/UnivocityGenerator.scala  |   2 +-
 .../spark/sql/catalyst/csv/UnivocityParser.scala   |   4 +-
 .../spark/sql/catalyst/util/DateTimeUtils.scala|  32 ++-
 .../sql/catalyst/util/TimestampFormatter.scala |  36 +++-
 .../spark/sql/errors/QueryExecutionErrors.scala|   8 +-
 .../sql/catalyst/util/DateTimeUtilsSuite.scala |  12 ++
 .../org/apache/spark/sql/CsvFunctionsSuite.scala   |  11 ++
 .../sql/execution/datasources/csv/CSVSuite.scala   | 216 -
 11 files changed, 331 insertions(+), 30 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37508][SQL] Add CONTAINS() string function

2021-12-01 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 710120a  [SPARK-37508][SQL] Add CONTAINS() string function
710120a is described below

commit 710120a499d6082bcec6b65ad1f8dbe4789f4bd9
Author: Angerszh 
AuthorDate: Wed Dec 1 12:57:22 2021 +0300

[SPARK-37508][SQL] Add CONTAINS() string function

### What changes were proposed in this pull request?
Add `CONTAINS` string function.

| function| arguments | Returns |
|---|---|---|
| CONTAINS( left , right) | left: String, right: String | Returns a 
BOOLEAN. The value is True if right is found inside left. Returns NULL if 
either input expression is NULL. Otherwise, returns False.|

### Why are the changes needed?
contains() is a common convenient function supported by a number of 
database systems:

- 
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#contains_substr
- https://docs.snowflake.com/en/sql-reference/functions/contains.html

Support of the function can make the migration from other systems to Spark 
SQL easier.

### Does this PR introduce _any_ user-facing change?
User can use `contains(left, right)`:

| Left   |  Right  |  Return |
|--|:-:|--:|
| null |  "Spark SQL" | null |
| "Spark SQL" |  null | null |
| null |  null | null |
| "Spark SQL" |  "Spark" | true |
| "Spark SQL" |  "k SQL" | true |
| "Spark SQL" | "SPARK" | false |

### How was this patch tested?
Added UT

Closes #34761 from AngersZh/SPARK-37508.

Authored-by: Angerszh 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  1 +
 .../catalyst/expressions/stringExpressions.scala   | 17 
 .../expressions/StringExpressionsSuite.scala   |  9 
 .../sql-functions/sql-expression-schema.md |  3 +-
 .../sql-tests/inputs/string-functions.sql  | 10 -
 .../results/ansi/string-functions.sql.out  | 50 +-
 .../sql-tests/results/string-functions.sql.out | 50 +-
 7 files changed, 136 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 0668460..b2788f8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -455,6 +455,7 @@ object FunctionRegistry {
 expression[Ascii]("ascii"),
 expression[Chr]("char", true),
 expression[Chr]("chr"),
+expression[Contains]("contains"),
 expression[Base64]("base64"),
 expression[BitLength]("bit_length"),
 expression[Length]("char_length", true),
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
index 2b997da..959c834 100755
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
@@ -465,6 +465,23 @@ abstract class StringPredicate extends BinaryExpression
 /**
  * A function that returns true if the string `left` contains the string 
`right`.
  */
+@ExpressionDescription(
+  usage = """
+_FUNC_(expr1, expr2) - Returns a boolean value if expr2 is found inside 
expr1.
+Returns NULL if either input expression is NULL.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_('Spark SQL', 'Spark');
+   true
+  > SELECT _FUNC_('Spark SQL', 'SPARK');
+   false
+  > SELECT _FUNC_('Spark SQL', null);
+   NULL
+  """,
+  since = "3.3.0",
+  group = "string_funcs"
+)
 case class Contains(left: Expression, right: Expression) extends 
StringPredicate {
   override def compare(l: UTF8String, r: UTF8String): Boolean = l.contains(r)
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
index 823ce77..443a94b 100644
--- 
a/sql/catalyst/src

[spark] branch master updated: [SPARK-37360][SQL] Support TimestampNTZ in JSON data source

2021-12-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f36978  [SPARK-37360][SQL] Support TimestampNTZ in JSON data source
4f36978 is described below

commit 4f369789bd5d6cc81a85fe01a37e0ae90cbdeb6c
Author: Ivan Sadikov 
AuthorDate: Mon Dec 6 13:24:46 2021 +0500

[SPARK-37360][SQL] Support TimestampNTZ in JSON data source

### What changes were proposed in this pull request?

This PR adds support for TimestampNTZ type in the JSON data source.

Most of the functionality has already been added, this patch verifies that 
writes + reads work for TimestampNTZ type and adds schema inference depending 
on the timestamp value format written. The following applies:
- If there is a mixture of `TIMESTAMP_NTZ` and `TIMESTAMP_LTZ` values, use 
`TIMESTAMP_LTZ`.
- If there are only `TIMESTAMP_NTZ` values, resolve using the the default 
timestamp type configured with `spark.sql.timestampType`.

In addition, I introduced a new JSON option `timestampNTZFormat` which is 
similar to `timestampFormat` but it allows to configure read/write pattern for 
`TIMESTAMP_NTZ` types. It is basically a copy of timestamp pattern but without 
timezone.

### Why are the changes needed?

The PR fixes issues when writing and reading TimestampNTZ to and from JSON.

### Does this PR introduce _any_ user-facing change?

Previously, JSON data source would infer timestamp values as 
`TimestampType` when reading a JSON file. Now, the data source would infer the 
timestamp value type based on the format (with or without timezone) and default 
timestamp type based on `spark.sql.timestampType`.

A new JSON option `timestampNTZFormat` is added to control the way values 
are formatted during writes or parsed during reads.

### How was this patch tested?

I extended `JsonSuite` with a few unit tests to verify that write-read 
roundtrip works for `TIMESTAMP_NTZ` and `TIMESTAMP_LTZ` values.

Closes #34638 from sadikovi/timestamp-ntz-support-json.

Authored-by: Ivan Sadikov 
Signed-off-by: Max Gekk 
---
 docs/sql-data-sources-json.md  |  10 +-
 .../spark/sql/catalyst/json/JSONOptions.scala  |   9 +-
 .../spark/sql/catalyst/json/JacksonGenerator.scala |   2 +-
 .../spark/sql/catalyst/json/JacksonParser.scala|   4 +-
 .../spark/sql/catalyst/json/JsonInferSchema.scala  |  12 ++
 .../sql/execution/datasources/json/JsonSuite.scala | 194 -
 6 files changed, 216 insertions(+), 15 deletions(-)

diff --git a/docs/sql-data-sources-json.md b/docs/sql-data-sources-json.md
index 5e3bd2b..b5f27aa 100644
--- a/docs/sql-data-sources-json.md
+++ b/docs/sql-data-sources-json.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
  http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -197,6 +197,12 @@ Data source options of JSON can be set via:
 read/write
   
   
+timestampNTZFormat
+-MM-dd'T'HH:mm:ss[.SSS]
+Sets the string that indicates a timestamp without timezone format. 
Custom date formats follow the formats at https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime
 Patterns. This applies to timestamp without timezone type, note that 
zone-offset and time-zone components are not supported when writing or reading 
this data type.
+read/write
+  
+  
 multiLine
 false
 Parse one record, which may span multiple lines, per file.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
index 029c014..e801912 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
@@ -106,6 +106,10 @@ private[sql] class JSONOptions(
   s"${DateFormatter.defaultPattern}'T'HH:mm:ss[.SSS][XXX]"
 })
 
+  val timestampNTZFormatInRead: Option[String] = 
parameters.get("timestampNTZFormat")
+  val timestampNTZFormatInWrite: String =
+parameters.getOrElse("timestampNTZFormat", 
s"${DateFormatter.defaultPattern}'T'HH:mm:ss[.SSS]")
+
   val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
 
   /*

[spark] branch master updated (72669b5 -> 0b959b5)

2021-12-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 72669b5  [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3
 add 0b959b5  [SPARK-37552][SQL] Add the `convert_timezone()` function

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  1 +
 .../catalyst/expressions/datetimeExpressions.scala | 53 ++
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 17 +++
 .../expressions/DateExpressionsSuite.scala | 40 
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 24 ++
 .../sql-functions/sql-expression-schema.md |  3 +-
 .../resources/sql-tests/inputs/timestamp-ntz.sql   |  2 +
 .../sql-tests/results/timestamp-ntz.sql.out| 10 +++-
 8 files changed, 148 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5edd959 -> c7dd2d5)

2021-12-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5edd959  [SPARK-37561][SQL] Avoid loading all functions when obtaining 
hive's DelegationToken
 add c7dd2d5  [SPARK-36137][SQL][FOLLOWUP] Correct the config key in error 
msg

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fba219c -> 5e4d664)

2021-12-13 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fba219c  [SPARK-37622][K8S] Support K8s executor rolling policy
 add 5e4d664  [SPARK-37591][SQL] Support the GCM mode by 
`aes_encrypt()`/`aes_decrypt()`

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/ExpressionImplUtils.java  | 49 --
 .../spark/sql/catalyst/expressions/misc.scala  | 20 +
 .../sql-functions/sql-expression-schema.md |  2 +-
 .../apache/spark/sql/DataFrameFunctionsSuite.scala |  1 +
 .../org/apache/spark/sql/MiscFunctionsSuite.scala  | 19 +
 5 files changed, 70 insertions(+), 21 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings

2021-12-14 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6a59fba  [SPARK-37575][SQL] null values should be saved as nothing 
rather than quoted empty Strings "" by default settings
6a59fba is described below

commit 6a59fba248359fb2614837fe8781dc63ac8fdc4c
Author: wayneguow 
AuthorDate: Tue Dec 14 11:26:34 2021 +0300

[SPARK-37575][SQL] null values should be saved as nothing rather than 
quoted empty Strings "" by default settings

### What changes were proposed in this pull request?
Fix the bug that null values are saved as quoted empty strings "" (as the 
same as empty strings) rather than nothing by default csv settings since Spark 
2.4.

### Why are the changes needed?

This is an unexpected bug, if don't fix it,  we still can't distinguish 
null values and empty strings in saved csv files.

As mentioned in [spark sql migration 
guide](https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24)(2.3=>2.4),
 empty strings are saved as quoted empty string "", null values as saved as 
nothing since Spark 2.4.

> Since Spark 2.4, empty strings are saved as quoted empty strings "". In 
version 2.3 and earlier, empty strings are equal to null values and do not 
reflect to any characters in saved CSV files. For example, the row of "a", 
null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as 
a,,"",1. To restore the previous behavior, set the CSV option emptyValue to 
empty (not quoted) string.

But actually, we found that null values are also saved as quoted empty 
strings "" as the same as empty strings.

For codes follows:
```scala
Seq(("Tesla", null.asInstanceOf[String], ""))
  .toDF("make", "comment", "blank")
  .coalesce(1)
  .write.csv(path)
```

actual results:
>Tesla,"",""

expected results:
>Tesla,,""

### Does this PR introduce _any_ user-facing change?

Yes, if this bug has been fixed, the output of null values would been 
changed to nothing rather than quoted empty strings "".

But, users can set nullValue to "\\"\\""(same as emptyValueInWrite's 
default value) to restore the previous behavior since 2.4.

### How was this patch tested?

Adding a test case.

Closes #34853 from wayneguow/SPARK-37575.

Lead-authored-by: wayneguow 
Co-authored-by: Wayne Guo 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/csv/UnivocityGenerator.scala  |  2 --
 .../spark/sql/execution/datasources/csv/CSVSuite.scala  | 13 -
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
index 10cccd5..9d65824 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
@@ -94,8 +94,6 @@ class UnivocityGenerator(
 while (i < row.numFields) {
   if (!row.isNullAt(i)) {
 values(i) = valueConverters(i).apply(row, i)
-  } else {
-values(i) = options.nullValue
   }
   i += 1
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 8c8079f..c7328d9 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -805,6 +805,17 @@ abstract class CSVSuite
 }
   }
 
+  test("SPARK-37575: null values should be saved as nothing rather than " +
+"quoted empty Strings \"\" with default settings") {
+withTempPath { path =>
+  Seq(("Tesla", null: String, ""))
+.toDF("make", "comment", "blank")
+.write
+.csv(path.getCanonicalPath)
+  checkAnswer(spark.read.text(path.getCanonicalPath), Row("Tesla,,\"\""))
+}
+  }
+
   test("save csv with compression codec option") {
 withTempDir { dir =>
   val csvDir = new File(dir, "csv").getCanonicalPath
@@ -1769,7 +1780,7 @@ abstract class CSVSuite
 (1, "John Doe"),
 (2, "-"),
 (3, "-"),
-(4, "-")
+(4, null)
   ).toDF("id", "name")
 
   checkAnswer(computed, expected)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings

2021-12-14 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 62e4202  [SPARK-37575][SQL] null values should be saved as nothing 
rather than quoted empty Strings "" by default settings
62e4202 is described below

commit 62e4202b65d76b05f9f9a15819a631524c6e7985
Author: wayneguow 
AuthorDate: Tue Dec 14 11:26:34 2021 +0300

[SPARK-37575][SQL] null values should be saved as nothing rather than 
quoted empty Strings "" by default settings

### What changes were proposed in this pull request?
Fix the bug that null values are saved as quoted empty strings "" (as the 
same as empty strings) rather than nothing by default csv settings since Spark 
2.4.

### Why are the changes needed?

This is an unexpected bug, if don't fix it,  we still can't distinguish 
null values and empty strings in saved csv files.

As mentioned in [spark sql migration 
guide](https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24)(2.3=>2.4),
 empty strings are saved as quoted empty string "", null values as saved as 
nothing since Spark 2.4.

> Since Spark 2.4, empty strings are saved as quoted empty strings "". In 
version 2.3 and earlier, empty strings are equal to null values and do not 
reflect to any characters in saved CSV files. For example, the row of "a", 
null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as 
a,,"",1. To restore the previous behavior, set the CSV option emptyValue to 
empty (not quoted) string.

But actually, we found that null values are also saved as quoted empty 
strings "" as the same as empty strings.

For codes follows:
```scala
Seq(("Tesla", null.asInstanceOf[String], ""))
  .toDF("make", "comment", "blank")
  .coalesce(1)
  .write.csv(path)
```

actual results:
>Tesla,"",""

expected results:
>Tesla,,""

### Does this PR introduce _any_ user-facing change?

Yes, if this bug has been fixed, the output of null values would been 
changed to nothing rather than quoted empty strings "".

But, users can set nullValue to "\\"\\""(same as emptyValueInWrite's 
default value) to restore the previous behavior since 2.4.

### How was this patch tested?

Adding a test case.

Closes #34853 from wayneguow/SPARK-37575.

Lead-authored-by: wayneguow 
Co-authored-by: Wayne Guo 
Signed-off-by: Max Gekk 
(cherry picked from commit 6a59fba248359fb2614837fe8781dc63ac8fdc4c)
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/csv/UnivocityGenerator.scala  |  2 --
 .../spark/sql/execution/datasources/csv/CSVSuite.scala  | 13 -
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
index 2abf7bf..8504877 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
@@ -84,8 +84,6 @@ class UnivocityGenerator(
 while (i < row.numFields) {
   if (!row.isNullAt(i)) {
 values(i) = valueConverters(i).apply(row, i)
-  } else {
-values(i) = options.nullValue
   }
   i += 1
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 7efdf7c..a472221 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -804,6 +804,17 @@ abstract class CSVSuite
 }
   }
 
+  test("SPARK-37575: null values should be saved as nothing rather than " +
+"quoted empty Strings \"\" with default settings") {
+withTempPath { path =>
+  Seq(("Tesla", null: String, ""))
+.toDF("make", "comment", "blank")
+.write
+.csv(path.getCanonicalPath)
+  checkAnswer(spark.read.text(path.getCanonicalPath), Row("Tesla,,\"\""))
+}
+  }
+
   test("save csv with compression codec option") {
 withTempDir { dir =>
   val csvDir = new File(dir, "csv").getCanonicalPath
@@ -1574,7 +1585,7 @@ a

[spark] branch master updated: [SPARK-37676][SQL] Support ANSI Aggregation Function: percentile_cont

2021-12-27 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 73789da  [SPARK-37676][SQL] Support ANSI Aggregation Function: 
percentile_cont
73789da is described below

commit 73789da962c9037bde21a53fb5826b10475658fe
Author: Jiaan Geng 
AuthorDate: Mon Dec 27 16:12:43 2021 +0300

[SPARK-37676][SQL] Support ANSI Aggregation Function: percentile_cont

### What changes were proposed in this pull request?
`PERCENTILE_CONT` is an ANSI aggregate functions.

The mainstream database supports `percentile_cont` show below:
**Postgresql**
https://www.postgresql.org/docs/9.4/functions-aggregate.html
**Teradata**
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/cPkFySIBORL~M938Zv07Cg
**Snowflake**
https://docs.snowflake.com/en/sql-reference/functions/percentile_cont.html
**Oracle**

https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/PERCENTILE_CONT.html#GUID-CA259452-A565-41B3-A4F4-DD74B66CEDE0
**H2**
http://www.h2database.com/html/functions-aggregate.html#percentile_cont
**Sybase**

https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc01776.1601/doc/html/san1278453109663.html
**Exasol**

https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/percentile_cont.htm
**RedShift**
https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html
**Yellowbrick**
https://www.yellowbrick.com/docs/2.2/ybd_sqlref/percentile_cont.html
**Mariadb**
https://mariadb.com/kb/en/percentile_cont/
**Phoenix**
http://phoenix.incubator.apache.org/language/functions.html#percentile_cont
**Singlestore**

https://docs.singlestore.com/db/v7.6/en/reference/sql-reference/window-functions/percentile_cont-and-median.html

### Why are the changes needed?
`PERCENTILE_CONT` is very useful. Exposing the expression can make the 
migration from other systems to Spark SQL easier.

### Does this PR introduce _any_ user-facing change?
'Yes'. New feature.

### How was this patch tested?
New tests.

Closes #34936 from beliefer/SPARK-37676.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 docs/sql-ref-ansi-compliance.md|  2 ++
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  6 
 .../spark/sql/catalyst/parser/AstBuilder.scala | 15 +++-
 .../sql/catalyst/parser/PlanParserSuite.scala  | 24 -
 .../test/resources/sql-tests/inputs/group-by.sql   | 18 +-
 .../resources/sql-tests/results/group-by.sql.out   | 41 +-
 6 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index 7b5bde4..1b4a778 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -494,6 +494,7 @@ Below is a list of all the keywords in Spark SQL.
 |PARTITIONED|non-reserved|non-reserved|non-reserved|
 |PARTITIONS|non-reserved|non-reserved|non-reserved|
 |PERCENT|non-reserved|non-reserved|non-reserved|
+|PERCENTILE_CONT|reserved|non-reserved|non-reserved|
 |PIVOT|non-reserved|non-reserved|non-reserved|
 |PLACING|non-reserved|non-reserved|non-reserved|
 |POSITION|non-reserved|non-reserved|reserved|
@@ -594,5 +595,6 @@ Below is a list of all the keywords in Spark SQL.
 |WHERE|reserved|non-reserved|reserved|
 |WINDOW|non-reserved|non-reserved|reserved|
 |WITH|reserved|non-reserved|reserved|
+|WITHIN|reserved|non-reserved|reserved|
 |YEAR|non-reserved|non-reserved|non-reserved|
 |ZONE|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 6511489..5037520 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -888,6 +888,8 @@ primaryExpression
FROM srcStr=valueExpression ')' 
#trim
 | OVERLAY '(' input=valueExpression PLACING replace=valueExpression
   FROM position=valueExpression (FOR length=valueExpression)? ')'  
#overlay
+| PERCENTILE_CONT '(' percentage=valueExpression ')'
+  WITHIN GROUP '(' ORDER BY sortItem ')'   
#percentile
 ;
 
 constant
@@ -1475,6 +1477,7 @@ nonReserved
 | PARTITION
 | PARTITIONED
 | PARTITIONS
+| PERCENTILE_CONT
 | PERCENTLIT
 | PIVOT
 | PLACING
@@ -1570,6 +1573,7 @@ nonReserved
 | WHERE
 | WINDOW
 | WITH
+|

[spark] branch master updated: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a6576de  [SPARK-34755][SQL] Support the utils for transform number 
format
a6576de is described below

commit a6576de9719204f6a87d2fc5e2e344bd1d0017a3
Author: Jiaan Geng 
AuthorDate: Wed Dec 29 11:07:06 2021 +0300

[SPARK-34755][SQL] Support the utils for transform number format

### What changes were proposed in this pull request?
Data Type Formatting Functions: `to_number` and `to_char` is very useful.
The implement has many different between `Postgresql` ,`Oracle` and 
`Phoenix`.
So, this PR follows the implement of `to_number` in `Oracle` that give a 
strict parameter verification.
So, this PR follows the implement of `to_number` in `Phoenix` that uses 
BigDecimal.

This PR support the patterns for numeric formatting as follows:

Pattern | Description
-- | --
9 | Value with the specified number of digits
0 | Value with leading zeros
. (period) | Decimal point
, (comma) | Group (thousand) separator
S | Sign anchored to number (uses locale)
$ | a value with a leading dollar sign
D | Decimal point (uses locale)
G | Group separator (uses locale)

There are some mainstream database support the syntax.
**PostgreSQL:**
https://www.postgresql.org/docs/12/functions-formatting.html

**Oracle:**

https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html#GUID-D4807212-AFD7-48A7-9AED-BEC3E8809866

**Vertica**

https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TO_NUMBER.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CFormatting%20Functions%7C_7

**Redshift**
https://docs.aws.amazon.com/redshift/latest/dg/r_TO_NUMBER.html

**DB2**

https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1544.htm

**Teradata**
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/TH2cDXBn6tala29S536nqg

**Snowflake:**
https://docs.snowflake.net/manuals/sql-reference/functions/to_decimal.html

**Exasol**

https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/to_number.htm#TO_NUMBER

**Phoenix**
http://phoenix.incubator.apache.org/language/functions.html#to_number

**Singlestore**

https://docs.singlestore.com/v7.3/reference/sql-reference/numeric-functions/to-number/

**Intersystems**

https://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_TONUMBER

Note: Based on discussion offline with cloud-fan ten months ago, this PR 
only implement the utils for transform number format. Because the utils should 
be review better.

### Why are the changes needed?
`to_number` and `to_char` are very useful for formatted currency to number 
conversion.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Jenkins test

Closes #31847 from beliefer/SPARK-34755.

Lead-authored-by: Jiaan Geng 
Co-authored-by: gengjiaan 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/util/NumberUtils.scala  | 189 
 .../spark/sql/errors/QueryCompilationErrors.scala  |   8 +
 .../spark/sql/errors/QueryExecutionErrors.scala|   6 +
 .../spark/sql/catalyst/util/NumberUtilsSuite.scala | 317 +
 4 files changed, 520 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala
new file mode 100644
index 000..6efde2a
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import java.math.BigDecimal
+import java.text.{DecimalFormat, NumberFormat, Pa

[spark] branch master updated (4d5ea5e -> 8fef5bb)

2022-01-24 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4d5ea5e  [SPARK-37153][PYTHON] Inline type hints for 
python/pyspark/profiler.py
 add 8fef5bb  [SPARK-37979][SQL] Switch to more generic error classes in 
AES functions

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   | 19 -
 .../spark/sql/errors/QueryExecutionErrors.scala| 19 +++--
 .../sql/errors/QueryExecutionErrorsSuite.scala | 48 --
 3 files changed, 50 insertions(+), 36 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37937][SQL] Use error classes in the parsing errors of lateral join

2022-02-03 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6347857  [SPARK-37937][SQL] Use error classes in the parsing errors of 
lateral join
6347857 is described below

commit 6347857f0bad105541971283f79281c490f6bb18
Author: Terry Kim 
AuthorDate: Thu Feb 3 14:56:11 2022 +0300

[SPARK-37937][SQL] Use error classes in the parsing errors of lateral join

### What changes were proposed in this pull request?
In the PR, I propose to use the following error classes for the parsing 
errors of lateral joins:
- `INVALID_SQL_SYNTAX `
- `UNSUPPORTED_FEATURE `

These new error classes are added to `error-classes.json`.

### Why are the changes needed?

Porting the parsing errors for lateral join to the new error framework 
should improve user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new test suite

Closes #35328 from imback82/SPARK-37937.

Authored-by: Terry Kim 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   |  4 ++
 .../spark/sql/errors/QueryParsingErrors.scala  |  8 +--
 .../sql/catalyst/parser/ErrorParserSuite.scala | 10 ---
 .../sql-tests/results/join-lateral.sql.out |  4 +-
 .../spark/sql/errors/QueryParsingErrorsSuite.scala | 81 ++
 5 files changed, 91 insertions(+), 16 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index a1ac99f..06ce22a 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -93,6 +93,10 @@
 "message" : [ "The value of parameter(s) '%s' in %s is invalid: %s" ],
 "sqlState" : "22023"
   },
+  "INVALID_SQL_SYNTAX" : {
+"message" : [ "Invalid SQL syntax: %s" ],
+"sqlState" : "42000"
+  },
   "MAP_KEY_DOES_NOT_EXIST" : {
 "message" : [ "Key %s does not exist. If necessary set %s to false to 
bypass this error." ]
   },
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index 938bbfd..6bcd20c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -102,19 +102,19 @@ object QueryParsingErrors {
   }
 
   def lateralJoinWithNaturalJoinUnsupportedError(ctx: ParserRuleContext): 
Throwable = {
-new ParseException("LATERAL join with NATURAL join is not supported", ctx)
+new ParseException("UNSUPPORTED_FEATURE", Array("LATERAL join with NATURAL 
join."), ctx)
   }
 
   def lateralJoinWithUsingJoinUnsupportedError(ctx: ParserRuleContext): 
Throwable = {
-new ParseException("LATERAL join with USING join is not supported", ctx)
+new ParseException("UNSUPPORTED_FEATURE", Array("LATERAL join with USING 
join."), ctx)
   }
 
   def unsupportedLateralJoinTypeError(ctx: ParserRuleContext, joinType: 
String): Throwable = {
-new ParseException(s"Unsupported LATERAL join type $joinType", ctx)
+new ParseException("UNSUPPORTED_FEATURE", Array(s"LATERAL join type 
'$joinType'."), ctx)
   }
 
   def invalidLateralJoinRelationError(ctx: RelationPrimaryContext): Throwable 
= {
-new ParseException(s"LATERAL can only be used with subquery", ctx)
+new ParseException("INVALID_SQL_SYNTAX", Array("LATERAL can only be used 
with subquery."), ctx)
   }
 
   def repetitiveWindowDefinitionError(name: String, ctx: WindowClauseContext): 
Throwable = {
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala
index dfc5edc..99051d6 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala
@@ -208,14 +208,4 @@ class ErrorParserSuite extends AnalysisTest {
 |SELECT b
   """.stripMargin, 2, 9, 10, msg + " test-table")
   }
-
-  test("SPARK-35789: lateral join with non-subquery relations") {
-val msg = "LATERAL can only be used with subquery"
-intercept("SELECT * FROM t1, LATERAL t2", msg)
-intercept("SELECT * FROM t1 JOIN LATERAL t2",

[spark] branch master updated (6347857 -> b63a577)

2022-02-03 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6347857  [SPARK-37937][SQL] Use error classes in the parsing errors of 
lateral join
 add b63a577  [SPARK-37941][SQL] Use error classes in the compilation 
errors of casting

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   |  3 +
 .../spark/sql/errors/QueryCompilationErrors.scala  | 25 +++
 .../sql/errors/QueryCompilationErrorsSuite.scala   | 80 ++
 3 files changed, 96 insertions(+), 12 deletions(-)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38105][SQL] Use error classes in the parsing errors of joins

2022-02-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0d56c94  [SPARK-38105][SQL] Use error classes in the parsing errors of 
joins
0d56c94 is described below

commit 0d56c947f10f747ab4b76426b2d6a34a1d3b8277
Author: Tengfei Huang 
AuthorDate: Sun Feb 6 21:19:29 2022 +0300

[SPARK-38105][SQL] Use error classes in the parsing errors of joins

### What changes were proposed in this pull request?
Migrate the following errors in QueryParsingErrors onto use error classes:
1. joinCriteriaUnimplementedError => throw IllegalStateException instead, 
since it should never happen and not visible to users, introduced by improving 
exhaustivity in [PR](https://github.com/apache/spark/pull/30455)
2. naturalCrossJoinUnsupportedError => UNSUPPORTED_FEATURE

### Why are the changes needed?
Porting join parsing errors to new error framework.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT added.

Closes #35405 from ivoson/SPARK-38105.

Authored-by: Tengfei Huang 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala   | 2 +-
 .../scala/org/apache/spark/sql/errors/QueryParsingErrors.scala| 6 +-
 .../org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala | 8 
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index ed2623e..bd43cff 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1146,7 +1146,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
   case Some(c) if c.booleanExpression != null =>
 (baseJoinType, Option(expression(c.booleanExpression)))
   case Some(c) =>
-throw QueryParsingErrors.joinCriteriaUnimplementedError(c, ctx)
+throw new IllegalStateException(s"Unimplemented joinCriteria: $c")
   case None if join.NATURAL != null =>
 if (join.LATERAL != null) {
   throw 
QueryParsingErrors.lateralJoinWithNaturalJoinUnsupportedError(ctx)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index 6bcd20c..6d7ed7b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -129,12 +129,8 @@ object QueryParsingErrors {
 new ParseException(s"Cannot resolve window reference '$name'", ctx)
   }
 
-  def joinCriteriaUnimplementedError(join: JoinCriteriaContext, ctx: 
RelationContext): Throwable = {
-new ParseException(s"Unimplemented joinCriteria: $join", ctx)
-  }
-
   def naturalCrossJoinUnsupportedError(ctx: RelationContext): Throwable = {
-new ParseException("NATURAL CROSS JOIN is not supported", ctx)
+new ParseException("UNSUPPORTED_FEATURE", Array("NATURAL CROSS JOIN."), 
ctx)
   }
 
   def emptyInputForTableSampleError(ctx: ParserRuleContext): Throwable = {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
index 1a213bf..03117b9 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
@@ -78,4 +78,12 @@ class QueryParsingErrorsSuite extends QueryTest with 
SharedSparkSession {
 message = "Invalid SQL syntax: LATERAL can only be used with 
subquery.")
 }
   }
+
+  test("UNSUPPORTED_FEATURE: NATURAL CROSS JOIN is not supported") {
+validateParsingError(
+  sqlText = "SELECT * FROM a NATURAL CROSS JOIN b",
+  errorClass = "UNSUPPORTED_FEATURE",
+  sqlState = "0A000",
+  message = "The feature is not supported: NATURAL CROSS JOIN.")
+  }
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f62b36c -> 65c0bdf)

2022-02-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f62b36c  [SPARK-38128][PYTHON][TESTS] Show full stacktrace in tests by 
default in PySpark tests
 add 65c0bdf  [SPARK-38126][SQL][TESTS] Check the whole message of error 
classes

No new revisions were added by this update.

Summary of changes:
 .../sql/errors/QueryCompilationErrorsSuite.scala   |  9 ++-
 .../sql/errors/QueryExecutionErrorsSuite.scala | 12 ++--
 .../spark/sql/errors/QueryParsingErrorsSuite.scala | 64 +-
 3 files changed, 62 insertions(+), 23 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2e703ae -> 08c851d)

2022-02-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2e703ae  [SPARK-38030][SQL] Canonicalization should not remove 
nullability of AttributeReference dataType
 add 08c851d  [SPARK-37943][SQL] Use error classes in the compilation 
errors of grouping

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   |  3 ++
 .../spark/sql/errors/QueryCompilationErrors.scala  |  4 ++-
 .../sql/errors/QueryCompilationErrorsSuite.scala   | 35 ++
 3 files changed, 41 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5f0a92c -> 7688d839)

2022-02-10 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5f0a92c  [SPARK-38157][SQL] Explicitly set ANSI to false in test 
timestampNTZ/timestamp.sql and SQLQueryTestSuite to match the expected golden 
results
 add 7688d839 [SPARK-38113][SQL] Use error classes in the execution errors 
of pivoting

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/errors/QueryExecutionErrors.scala|  8 +--
 .../sql/errors/QueryExecutionErrorsSuite.scala | 27 +-
 2 files changed, 32 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7688d839 -> 53ba6e2)

2022-02-10 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7688d839 [SPARK-38113][SQL] Use error classes in the execution errors 
of pivoting
 add 53ba6e2  [SPARK-38131][SQL] Use error classes in user-facing 
exceptions only

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json|  4 
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  3 ++-
 .../sql/catalyst/expressions/csvExpressions.scala   |  2 +-
 .../spark/sql/errors/QueryCompilationErrors.scala   |  8 
 .../spark/sql/errors/QueryExecutionErrors.scala |  5 -
 .../sql/errors/QueryCompilationErrorsSuite.scala| 21 +
 6 files changed, 4 insertions(+), 39 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (17653fb -> 3d285c1)

2022-02-10 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 17653fb  [SPARK-37401][PYTHON][ML] Inline typehints for 
pyspark.ml.clustering
 add 3d285c1  [SPARK-38123][SQL] Unified use `DataType` as `targetType` of 
`QueryExecutionErrors#castingCauseOverflowError`

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/Cast.scala  | 60 --
 .../spark/sql/catalyst/util/IntervalUtils.scala| 18 +++
 .../spark/sql/errors/QueryExecutionErrors.scala|  4 +-
 .../scala/org/apache/spark/sql/types/Decimal.scala | 14 ++---
 .../org/apache/spark/sql/types/numerics.scala  | 10 ++--
 .../sql-tests/results/postgreSQL/float4.sql.out|  2 +-
 .../sql-tests/results/postgreSQL/float8.sql.out|  2 +-
 .../sql-tests/results/postgreSQL/int8.sql.out  |  2 +-
 8 files changed, 59 insertions(+), 53 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38198][SQL] Fix `QueryExecution.debug#toFile` use the passed in `maxFields` when `explainMode` is `CodegenMode`

2022-02-14 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ff92e85  [SPARK-38198][SQL] Fix `QueryExecution.debug#toFile` use the 
passed in `maxFields` when `explainMode` is `CodegenMode`
ff92e85 is described below

commit ff92e85f86d3e36428996695001a23893d406b76
Author: yangjie01 
AuthorDate: Mon Feb 14 13:28:11 2022 +0300

[SPARK-38198][SQL] Fix `QueryExecution.debug#toFile` use the passed in 
`maxFields` when `explainMode` is `CodegenMode`

### What changes were proposed in this pull request?
`QueryExecution.debug#toFile` method supports passing in `maxFields` and 
this parameter will be passed down when `explainMode` is `SimpleMode`, 
`ExtendedMode`, or `CostMode`.

But the passed down `maxFields` was ignored when `explainMode` is 
`CostMode` because `QueryExecution#stringWithStats` overrides it with 
`SQLConf.get.maxToStringFields` at present,  so this pr removes the override 
behavior to let passed in `maxFields` take effect.

### Why are the changes needed?
Bug fix

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA and add a new test case

Closes #35506 from LuciferYang/SPARK-38198.

Authored-by: yangjie01 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/execution/QueryExecution.scala|  2 --
 .../spark/sql/execution/QueryExecutionSuite.scala  | 18 ++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
index 26c6904..1b08994 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
@@ -304,8 +304,6 @@ class QueryExecution(
   }
 
   private def stringWithStats(maxFields: Int, append: String => Unit): Unit = {
-val maxFields = SQLConf.get.maxToStringFields
-
 // trigger to compute stats for logical plans
 try {
   // This will trigger to compute stats for all the nodes in the plan, 
including subqueries,
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala
index ecc448f..2c58b53 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala
@@ -261,4 +261,22 @@ class QueryExecutionSuite extends SharedSparkSession {
 val cmdResultExec = projectQe.executedPlan.asInstanceOf[CommandResultExec]
 assert(cmdResultExec.commandPhysicalPlan.isInstanceOf[ShowTablesExec])
   }
+
+  test("SPARK-38198: check specify maxFields when call toFile method") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath + "/plans.txt"
+  // Define a dataset with 6 columns
+  val ds = spark.createDataset(Seq((0, 1, 2, 3, 4, 5), (6, 7, 8, 9, 10, 
11)))
+  // `CodegenMode` and `FormattedMode` doesn't use the maxFields, so not 
tested in this case
+  Seq(SimpleMode.name, ExtendedMode.name, CostMode.name).foreach { 
modeName =>
+val maxFields = 3
+ds.queryExecution.debug.toFile(path, explainMode = Some(modeName), 
maxFields = maxFields)
+Utils.tryWithResource(Source.fromFile(path)) { source =>
+  val tableScan = 
source.getLines().filter(_.contains("LocalTableScan"))
+  assert(tableScan.exists(_.contains("more fields")),
+s"Specify maxFields = $maxFields doesn't take effect when 
explainMode is $modeName")
+}
+  }
+}
+  }
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ff92e85 -> c8b34ab)

2022-02-14 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ff92e85  [SPARK-38198][SQL] Fix `QueryExecution.debug#toFile` use the 
passed in `maxFields` when `explainMode` is `CodegenMode`
 add c8b34ab  [SPARK-38097][SQL][TESTS] Improved the error message for 
pivoting unsupported column

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 8 
 .../scala/org/apache/spark/sql/RelationalGroupedDataset.scala| 9 -
 .../org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala  | 7 +++
 3 files changed, 19 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (75c7726 -> 940ac0c)

2022-02-14 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75c7726  [SPARK-37498][PYTHON] Add eventually for 
test_reuse_worker_of_parallelize_range
 add 940ac0c  [SPARK-38198][SQL][3.2] Fix QueryExecution.debug#toFile use 
the passed in maxFields when explainMode is CodegenMode

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/QueryExecution.scala|  2 --
 .../spark/sql/execution/QueryExecutionSuite.scala  | 18 ++
 2 files changed, 18 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ea1f922 -> a9a792b3)

2022-02-15 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ea1f922  [SPARK-37707][SQL][FOLLOWUP] Allow implicitly casting Date 
type to AnyTimestampType under ANSI mode
 add a9a792b3 [SPARK-38199][SQL] Delete the unused `dataType` specified in 
the definition of `IntervalColumnAccessor`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/execution/columnar/ColumnAccessor.scala  | 2 +-
 .../apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (837248a -> 3a7eafd)

2022-02-17 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 837248a  [MINOR][DOC] Fix documentation for structured streaming - 
addListener
 add 3a7eafd  [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md|  1 +
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  4 ++
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  1 +
 .../catalyst/expressions/datetimeExpressions.scala | 84 ++
 .../spark/sql/catalyst/parser/AstBuilder.scala | 11 +++
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 36 ++
 .../spark/sql/errors/QueryExecutionErrors.scala|  6 ++
 .../expressions/DateExpressionsSuite.scala | 62 
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 36 +-
 .../sql-functions/sql-expression-schema.md |  3 +-
 .../test/resources/sql-tests/inputs/timestamp.sql  |  6 ++
 .../sql-tests/results/ansi/timestamp.sql.out   | 34 -
 .../sql-tests/results/datetime-legacy.sql.out  | 34 -
 .../resources/sql-tests/results/timestamp.sql.out  | 34 -
 .../results/timestampNTZ/timestamp-ansi.sql.out| 34 -
 .../results/timestampNTZ/timestamp.sql.out | 34 -
 .../sql/errors/QueryExecutionErrorsSuite.scala | 12 +++-
 17 files changed, 424 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 1289 matches

Mail list logo