[spark] branch master updated: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f824d058b14 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175` f824d058b14 is described below commit f824d058b14e3c58b1c90f64fefc45fac105c7dd Author: Koray Beyaz AuthorDate: Thu Aug 3 10:57:26 2023 +0500 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175` ### What changes were proposed in this pull request? - Rename _LEGACY_ERROR_TEMP_2175 as RULE_ID_NOT_FOUND - Add a test case for the error class. ### Why are the changes needed? We are migrating onto error classes ### Does this PR introduce _any_ user-facing change? Yes, the error message will include the error class name ### How was this patch tested? `testOnly *RuleIdCollectionSuite` and Github Actions Closes #40991 from kori73/SPARK-42330. Lead-authored-by: Koray Beyaz Co-authored-by: Koray Beyaz Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/error-classes.json | 11 ++- docs/sql-error-conditions.md | 6 ++ .../org/apache/spark/sql/errors/QueryExecutionErrors.scala| 5 ++--- .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 11 +++ 4 files changed, 25 insertions(+), 8 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index a9619b97bd9..20f2ab4eb24 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2471,6 +2471,12 @@ ], "sqlState" : "42883" }, + "RULE_ID_NOT_FOUND" : { +"message" : [ + "Not found an id for the rule name \"\". Please modify RuleIdCollection.scala if you are adding a new rule." +], +"sqlState" : "22023" + }, "SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION" : { "message" : [ "The correlated scalar subquery '' is neither present in GROUP BY, nor in an aggregate function. Add it to GROUP BY using ordinal position or wrap it in `first()` (or `first_value`) if you don't care which value you get." @@ -5489,11 +5495,6 @@ "." ] }, - "_LEGACY_ERROR_TEMP_2175" : { -"message" : [ - "Rule id not found for . Please modify RuleIdCollection.scala if you are adding a new rule." -] - }, "_LEGACY_ERROR_TEMP_2176" : { "message" : [ "Cannot create array with elements of data due to exceeding the limit elements for ArrayData. " diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 161f3bdbef1..5609d60f974 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -1586,6 +1586,12 @@ The function `` cannot be found. Verify the spelling and correctnes If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP FUNCTION IF EXISTS. +### RULE_ID_NOT_FOUND + +[SQLSTATE: 22023](sql-error-conditions-sqlstates.html#class-22-data-exception) + +Not found an id for the rule name "``". Please modify RuleIdCollection.scala if you are adding a new rule. + ### SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION SQLSTATE: none assigned diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 3622ffebb74..45b5d6b6692 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -1584,9 +1584,8 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE def ruleIdNotFoundForRuleError(ruleName: String): Throwable = { new SparkException( - errorClass = "_LEGACY_ERROR_TEMP_2175", - messageParameters = Map( -"ruleName" -> ruleName), + errorClass = "RULE_ID_NOT_FOUND", + messageParameters = Map("ruleName" -> ruleName), cause = null) } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala index e70d04b7b5a..ae1c0a86a14 100644 --- a/sql/core/src/test/sca
[spark] branch branch-3.5 updated: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new a1ca1e6e763 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175` a1ca1e6e763 is described below commit a1ca1e6e7633c3fbb36427a82635cda7d21f1dab Author: Koray Beyaz AuthorDate: Thu Aug 3 10:57:26 2023 +0500 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175` ### What changes were proposed in this pull request? - Rename _LEGACY_ERROR_TEMP_2175 as RULE_ID_NOT_FOUND - Add a test case for the error class. ### Why are the changes needed? We are migrating onto error classes ### Does this PR introduce _any_ user-facing change? Yes, the error message will include the error class name ### How was this patch tested? `testOnly *RuleIdCollectionSuite` and Github Actions Closes #40991 from kori73/SPARK-42330. Lead-authored-by: Koray Beyaz Co-authored-by: Koray Beyaz Signed-off-by: Max Gekk (cherry picked from commit f824d058b14e3c58b1c90f64fefc45fac105c7dd) Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/error-classes.json | 11 ++- docs/sql-error-conditions.md | 6 ++ .../org/apache/spark/sql/errors/QueryExecutionErrors.scala| 5 ++--- .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 11 +++ 4 files changed, 25 insertions(+), 8 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index df425d7b2df..d9d1963c958 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2412,6 +2412,12 @@ ], "sqlState" : "42883" }, + "RULE_ID_NOT_FOUND" : { +"message" : [ + "Not found an id for the rule name \"\". Please modify RuleIdCollection.scala if you are adding a new rule." +], +"sqlState" : "22023" + }, "SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION" : { "message" : [ "The correlated scalar subquery '' is neither present in GROUP BY, nor in an aggregate function. Add it to GROUP BY using ordinal position or wrap it in `first()` (or `first_value`) if you don't care which value you get." @@ -5425,11 +5431,6 @@ "." ] }, - "_LEGACY_ERROR_TEMP_2175" : { -"message" : [ - "Rule id not found for . Please modify RuleIdCollection.scala if you are adding a new rule." -] - }, "_LEGACY_ERROR_TEMP_2176" : { "message" : [ "Cannot create array with elements of data due to exceeding the limit elements for ArrayData. " diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 9e2a484d057..e1430e94db5 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -1578,6 +1578,12 @@ The function `` cannot be found. Verify the spelling and correctnes If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP FUNCTION IF EXISTS. +### RULE_ID_NOT_FOUND + +[SQLSTATE: 22023](sql-error-conditions-sqlstates.html#class-22-data-exception) + +Not found an id for the rule name "``". Please modify RuleIdCollection.scala if you are adding a new rule. + ### SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION SQLSTATE: none assigned diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 89c080409e2..7685e0f907c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -1584,9 +1584,8 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE def ruleIdNotFoundForRuleError(ruleName: String): Throwable = { new SparkException( - errorClass = "_LEGACY_ERROR_TEMP_2175", - messageParameters = Map( -"ruleName" -> ruleName), + errorClass = "RULE_ID_NOT_FOUND", + messageParameters = Map("ruleName" -> ruleName), cause = null) } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/sc
[spark] branch master updated: [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1f10cc4a594 [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic 1f10cc4a594 is described below commit 1f10cc4a59457ed0de0fd4dc0a1c61514d77261a Author: panbingkun AuthorDate: Mon Aug 7 12:01:47 2023 +0500 [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic ### What changes were proposed in this pull request? The pr aims to clear some unused codes in "***Errors" and extract some common logic. ### Why are the changes needed? Make code clear. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #42238 from panbingkun/clear_error. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../apache/spark/sql/errors/DataTypeErrors.scala | 18 ++--- .../apache/spark/sql/errors/QueryErrorsBase.scala | 6 +- .../spark/sql/errors/QueryExecutionErrors.scala| 86 -- 3 files changed, 10 insertions(+), 100 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala index 7a34a386cd8..5e52e283338 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala @@ -192,15 +192,7 @@ private[sql] object DataTypeErrors extends DataTypeErrorsBase { decimalPrecision: Int, decimalScale: Int, context: SQLQueryContext = null): ArithmeticException = { -new SparkArithmeticException( - errorClass = "NUMERIC_VALUE_OUT_OF_RANGE", - messageParameters = Map( -"value" -> value.toPlainString, -"precision" -> decimalPrecision.toString, -"scale" -> decimalScale.toString, -"config" -> toSQLConf("spark.sql.ansi.enabled")), - context = getQueryContext(context), - summary = getSummary(context)) +numericValueOutOfRange(value, decimalPrecision, decimalScale, context) } def cannotChangeDecimalPrecisionError( @@ -208,6 +200,14 @@ private[sql] object DataTypeErrors extends DataTypeErrorsBase { decimalPrecision: Int, decimalScale: Int, context: SQLQueryContext = null): ArithmeticException = { +numericValueOutOfRange(value, decimalPrecision, decimalScale, context) + } + + private def numericValueOutOfRange( + value: Decimal, + decimalPrecision: Int, + decimalScale: Int, + context: SQLQueryContext): ArithmeticException = { new SparkArithmeticException( errorClass = "NUMERIC_VALUE_OUT_OF_RANGE", messageParameters = Map( diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala index db256fbee87..26600117a0c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.errors import org.apache.spark.sql.catalyst.expressions.{Expression, Literal} -import org.apache.spark.sql.catalyst.util.{toPrettySQL, QuotingUtils} +import org.apache.spark.sql.catalyst.util.toPrettySQL import org.apache.spark.sql.types.{DataType, DoubleType, FloatType} /** @@ -55,10 +55,6 @@ private[sql] trait QueryErrorsBase extends DataTypeErrorsBase { quoteByDefault(toPrettySQL(e)) } - def toSQLSchema(schema: String): String = { -QuotingUtils.toSQLSchema(schema) - } - // Converts an error class parameter to its SQL representation def toSQLValue(v: Any, t: DataType): String = Literal.create(v, t) match { case Literal(null, _) => "NULL" diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 45b5d6b6692..f960a091ec0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -32,7 +32,6 @@ import org.apache.spark._ import org.apache.spark.launcher.SparkLauncher import org.apache.spark.memory.SparkOutOfMemoryError import org.apache.spark.sql.AnalysisException -import org.apache.spark.sql.catalyst.ScalaReflection.Schema import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.analysis.UnresolvedGenerator import org.ap
[spark] branch master updated (1f10cc4a594 -> f139733b92d)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1f10cc4a594 [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic add f139733b92d [SPARK-42321][SQL] Assign name to _LEGACY_ERROR_TEMP_2133 No new revisions were added by this update. Summary of changes: .../utils/src/main/resources/error/error-classes.json | 10 +- ...ditions-malformed-record-in-parsing-error-class.md | 4 .../spark/sql/catalyst/json/JacksonParser.scala | 8 .../spark/sql/catalyst/util/BadRecordException.scala | 9 + .../spark/sql/catalyst/util/FailureSafeParser.scala | 3 +++ .../spark/sql/errors/QueryExecutionErrors.scala | 19 --- .../spark/sql/errors/QueryExecutionErrorsSuite.scala | 17 + 7 files changed, 54 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38475][CORE] Use error class in org.apache.spark.serializer
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2a23c7a18a0 [SPARK-38475][CORE] Use error class in org.apache.spark.serializer 2a23c7a18a0 is described below commit 2a23c7a18a0ba75d95ee1d898896a8f0dc2c5531 Author: Bo Zhang AuthorDate: Mon Aug 7 22:10:01 2023 +0500 [SPARK-38475][CORE] Use error class in org.apache.spark.serializer ### What changes were proposed in this pull request? This PR aims to change exceptions created in package org.apache.spark.serializer to use error class. ### Why are the changes needed? This is to move exceptions created in package org.apache.spark.serializer to error class. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #42243 from bozhang2820/spark-38475. Lead-authored-by: Bo Zhang Co-authored-by: Bo Zhang Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 21 + .../spark/serializer/GenericAvroSerializer.scala | 6 ++--- .../apache/spark/serializer/KryoSerializer.scala | 27 -- docs/sql-error-conditions.md | 24 +++ 4 files changed, 68 insertions(+), 10 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 680f787429c..0ea1eed35e4 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -831,6 +831,11 @@ "Not found an encoder of the type to Spark SQL internal representation. Consider to change the input type to one of supported at '/sql-ref-datatypes.html'." ] }, + "ERROR_READING_AVRO_UNKNOWN_FINGERPRINT" : { +"message" : [ + "Error reading avro data -- encountered an unknown fingerprint: , not sure what schema to use. This could happen if you registered additional schemas after starting your spark context." +] + }, "EVENT_TIME_IS_NOT_ON_TIMESTAMP_TYPE" : { "message" : [ "The event time has the invalid type , but expected \"TIMESTAMP\"." @@ -864,6 +869,11 @@ ], "sqlState" : "22018" }, + "FAILED_REGISTER_CLASS_WITH_KRYO" : { +"message" : [ + "Failed to register classes with Kryo." +] + }, "FAILED_RENAME_PATH" : { "message" : [ "Failed to rename to as destination already exists." @@ -1564,6 +1574,12 @@ ], "sqlState" : "22032" }, + "INVALID_KRYO_SERIALIZER_BUFFER_SIZE" : { +"message" : [ + "The value of the config \"\" must be less than 2048 MiB, but got MiB." +], +"sqlState" : "F" + }, "INVALID_LAMBDA_FUNCTION_CALL" : { "message" : [ "Invalid lambda function call." @@ -2006,6 +2022,11 @@ "The join condition has the invalid type , expected \"BOOLEAN\"." ] }, + "KRYO_BUFFER_OVERFLOW" : { +"message" : [ + "Kryo serialization failed: . To avoid this, increase \"\" value." +] + }, "LOAD_DATA_PATH_NOT_EXISTS" : { "message" : [ "LOAD DATA input path does not exist: ." diff --git a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala index 7d2923fdf37..d09abff2773 100644 --- a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala +++ b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala @@ -140,9 +140,9 @@ private[serializer] class GenericAvroSerializer[D <: GenericContainer] case Some(s) => new Schema.Parser().setValidateDefaults(false).parse(s) case None => throw new SparkException( -"Error reading attempting to read avro data -- encountered an unknown " + - s"fingerprint: $fingerprint, not sure what schema to use. This could happen " + - "if you registered additional schemas after starting your spark context.") +errorClass = "ERROR_READING_AVRO_UNKNOWN_FINGERPRINT", +messageParameters = Map("fingerprint" -> fingerprint.toString), +cause = null) } }) } else { diff --git a/core/src/main/scala/org/apache/spark/seriali
[spark] branch branch-3.5 updated: [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new b623c28f521 [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT` b623c28f521 is described below commit b623c28f521e350b0f4bf15bfb911ca6bf0b1a80 Author: Max Gekk AuthorDate: Tue Aug 8 13:26:19 2023 +0500 [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT` ### What changes were proposed in this pull request? In the PR, I propose to check that `DEFAULT` clause contains a parameter. If so, raise appropriate error about the feature is not supported. Currently, table creation with `DEFAULT` containing any parameters finishes successfully even parameters are not supported in such case: ```sql scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = Map("parm" -> 5)).show() ++ || ++ ++ scala> spark.sql("describe t12"); org.apache.spark.sql.AnalysisException: [INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute EXISTS_DEFAULT command because the destination table column `c1` has a DEFAULT value :parm, which fails to resolve as a valid expression. ``` ### Why are the changes needed? This improves user experience with Spark SQL by saying about the root cause of the issue. ### Does this PR introduce _any_ user-facing change? Yes. After the change, the table creation completes w/ the error: ```sql scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = Map("parm" -> 5)).show() org.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT] The feature is not supported: Parameter markers are not allowed in DEFAULT.(line 1, pos 32) == SQL == CREATE TABLE t12(c1 int default :parm) ^^^ ``` ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *ParametersSuite" ``` Closes #42365 from MaxGekk/fix-param-in-DEFAULT. Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit f7879b4c2500046cd7d889ba94adedd3000f8c41) Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 12 .../test/scala/org/apache/spark/sql/ParametersSuite.scala | 15 +++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 7a28efa3e42..83938632e53 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -40,6 +40,7 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.trees.CurrentOrigin +import org.apache.spark.sql.catalyst.trees.TreePattern.PARAMETER import org.apache.spark.sql.catalyst.types.DataTypeUtils import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, GeneratedColumn, IntervalUtils, ResolveDefaultColumns} import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, stringToTimestamp, stringToTimestampWithoutTimeZone} @@ -3130,9 +3131,12 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { ctx.asScala.headOption.map(visitLocationSpec) } - private def verifyAndGetExpression(exprCtx: ExpressionContext): String = { + private def verifyAndGetExpression(exprCtx: ExpressionContext, place: String): String = { // Make sure it can be converted to Catalyst expressions. -expression(exprCtx) +val expr = expression(exprCtx) +if (expr.containsPattern(PARAMETER)) { + throw QueryParsingErrors.parameterMarkerNotAllowed(place, expr.origin) +} // Extract the raw expression text so that we can save the user provided text. We don't // use `Expression.sql` to avoid storing incorrect text caused by bugs in any expression's // `sql` method. Note: `exprCtx.getText` returns a string without spaces, so we need to @@ -3147,7 +3151,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { */ override def visitDefaultExpression(ctx: DefaultExpressionContext): String = withOrigin(ctx) { - verifyAndGetExpression(ctx.expression()) + verifyAndGetExpression(ctx.expression(), "DEFAULT") } /** @@
[spark] branch master updated: [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f7879b4c250 [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT` f7879b4c250 is described below commit f7879b4c2500046cd7d889ba94adedd3000f8c41 Author: Max Gekk AuthorDate: Tue Aug 8 13:26:19 2023 +0500 [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT` ### What changes were proposed in this pull request? In the PR, I propose to check that `DEFAULT` clause contains a parameter. If so, raise appropriate error about the feature is not supported. Currently, table creation with `DEFAULT` containing any parameters finishes successfully even parameters are not supported in such case: ```sql scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = Map("parm" -> 5)).show() ++ || ++ ++ scala> spark.sql("describe t12"); org.apache.spark.sql.AnalysisException: [INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute EXISTS_DEFAULT command because the destination table column `c1` has a DEFAULT value :parm, which fails to resolve as a valid expression. ``` ### Why are the changes needed? This improves user experience with Spark SQL by saying about the root cause of the issue. ### Does this PR introduce _any_ user-facing change? Yes. After the change, the table creation completes w/ the error: ```sql scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = Map("parm" -> 5)).show() org.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT] The feature is not supported: Parameter markers are not allowed in DEFAULT.(line 1, pos 32) == SQL == CREATE TABLE t12(c1 int default :parm) ^^^ ``` ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *ParametersSuite" ``` Closes #42365 from MaxGekk/fix-param-in-DEFAULT. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 12 .../test/scala/org/apache/spark/sql/ParametersSuite.scala | 15 +++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 1b9dda51bf0..0635e6a1b44 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -40,6 +40,7 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.trees.CurrentOrigin +import org.apache.spark.sql.catalyst.trees.TreePattern.PARAMETER import org.apache.spark.sql.catalyst.types.DataTypeUtils import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, GeneratedColumn, IntervalUtils, ResolveDefaultColumns} import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, stringToTimestamp, stringToTimestampWithoutTimeZone} @@ -3153,9 +3154,12 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { ctx.asScala.headOption.map(visitLocationSpec) } - private def verifyAndGetExpression(exprCtx: ExpressionContext): String = { + private def verifyAndGetExpression(exprCtx: ExpressionContext, place: String): String = { // Make sure it can be converted to Catalyst expressions. -expression(exprCtx) +val expr = expression(exprCtx) +if (expr.containsPattern(PARAMETER)) { + throw QueryParsingErrors.parameterMarkerNotAllowed(place, expr.origin) +} // Extract the raw expression text so that we can save the user provided text. We don't // use `Expression.sql` to avoid storing incorrect text caused by bugs in any expression's // `sql` method. Note: `exprCtx.getText` returns a string without spaces, so we need to @@ -3170,7 +3174,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { */ override def visitDefaultExpression(ctx: DefaultExpressionContext): String = withOrigin(ctx) { - verifyAndGetExpression(ctx.expression()) + verifyAndGetExpression(ctx.expression(), "DEFAULT") } /** @@ -3178,7 +3182,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { */ over
[spark] branch master updated: [SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b9fc5c03ed6 [SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF` b9fc5c03ed6 is described below commit b9fc5c03ed69e91d9c4cbe7ff5a1522c7b849568 Author: Max Gekk AuthorDate: Sat Aug 12 11:08:39 2023 +0500 [SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF` ### What changes were proposed in this pull request? In the PR, I propose to extend the rules of `primaryExpression` in `SqlBaseParser.g4`, and one more function `TIMEDIFF` which accepts 3-args in the same way as the existing expressions `TIMESTAMPDIFF`. ### Why are the changes needed? To achieve feature parity w/ other system and make the migration to Spark SQL from such systems easier: 1. Snowflake: https://docs.snowflake.com/en/sql-reference/functions/timediff 2. MySQL/MariaDB: https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_timediff ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the existing test suites: ``` $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" ``` Closes #42435 from MaxGekk/timediff. Authored-by: Max Gekk Signed-off-by: Max Gekk --- docs/sql-ref-ansi-compliance.md| 1 + .../spark/sql/catalyst/parser/SqlBaseLexer.g4 | 1 + .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 4 +- .../analyzer-results/ansi/timestamp.sql.out| 68 ++ .../analyzer-results/datetime-legacy.sql.out | 68 ++ .../sql-tests/analyzer-results/timestamp.sql.out | 68 ++ .../timestampNTZ/timestamp-ansi.sql.out| 70 +++ .../timestampNTZ/timestamp.sql.out | 70 +++ .../test/resources/sql-tests/inputs/timestamp.sql | 8 +++ .../sql-tests/results/ansi/keywords.sql.out| 1 + .../sql-tests/results/ansi/timestamp.sql.out | 80 ++ .../sql-tests/results/datetime-legacy.sql.out | 80 ++ .../resources/sql-tests/results/keywords.sql.out | 1 + .../resources/sql-tests/results/timestamp.sql.out | 80 ++ .../results/timestampNTZ/timestamp-ansi.sql.out| 80 ++ .../results/timestampNTZ/timestamp.sql.out | 80 ++ .../ThriftServerWithSparkContextSuite.scala| 2 +- 17 files changed, 760 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index f3a0e8f9afb..09c38a00995 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -636,6 +636,7 @@ Below is a list of all the keywords in Spark SQL. |TERMINATED|non-reserved|non-reserved|non-reserved| |THEN|reserved|non-reserved|reserved| |TIME|reserved|non-reserved|reserved| +|TIMEDIFF|non-reserved|non-reserved|non-reserved| |TIMESTAMP|non-reserved|non-reserved|non-reserved| |TIMESTAMP_LTZ|non-reserved|non-reserved|non-reserved| |TIMESTAMP_NTZ|non-reserved|non-reserved|non-reserved| diff --git a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 index bf6370575a1..d9128de0f5d 100644 --- a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 +++ b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 @@ -373,6 +373,7 @@ TEMPORARY: 'TEMPORARY' | 'TEMP'; TERMINATED: 'TERMINATED'; THEN: 'THEN'; TIME: 'TIME'; +TIMEDIFF: 'TIMEDIFF'; TIMESTAMP: 'TIMESTAMP'; TIMESTAMP_LTZ: 'TIMESTAMP_LTZ'; TIMESTAMP_NTZ: 'TIMESTAMP_NTZ'; diff --git a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 index a45ebee3106..7a69b10dadb 100644 --- a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 +++ b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 @@ -953,7 +953,7 @@ datetimeUnit primaryExpression : name=(CURRENT_DATE | CURRENT_TIMESTAMP | CURRENT_USER | USER) #currentLike | name=(TIMESTAMPADD | DATEADD | DATE_ADD) LEFT_PAREN (unit=datetimeUnit | invalidUnit=stringLit) COMMA unitsAmount=valueExpression COMMA timestamp=valueExpression RIGHT_PAREN #timestampadd -| name=(TIMESTAMPDIFF | DATEDIFF | DATE_DIFF) LEFT_PAREN (unit=datetimeUnit | invalidUnit=stringLit) COMMA startTimestamp=valueExpression COM
[spark] branch master updated: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 295c615b16b [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278] 295c615b16b is described below commit 295c615b16b8a77f242ffa99006b4fb95f8f3487 Author: panbingkun AuthorDate: Sat Aug 12 12:22:28 2023 +0500 [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278] ### What changes were proposed in this pull request? The pr aims to assign names to the error class, include: - _LEGACY_ERROR_TEMP_1009 => VIEW_EXCEED_MAX_NESTED_DEPTH - _LEGACY_ERROR_TEMP_1010 => UNSUPPORTED_VIEW_OPERATION.WITHOUT_SUGGESTION - _LEGACY_ERROR_TEMP_1013 => UNSUPPORTED_VIEW_OPERATION.WITH_SUGGESTION / UNSUPPORTED_TEMP_VIEW_OPERATION.WITH_SUGGESTION - _LEGACY_ERROR_TEMP_1014 => UNSUPPORTED_TEMP_VIEW_OPERATION.WITHOUT_SUGGESTION - _LEGACY_ERROR_TEMP_1015 => UNSUPPORTED_TABLE_OPERATION.WITH_SUGGESTION - _LEGACY_ERROR_TEMP_1016 => UNSUPPORTED_TEMP_VIEW_OPERATION.WITHOUT_SUGGESTION - _LEGACY_ERROR_TEMP_1278 => UNSUPPORTED_TABLE_OPERATION.WITHOUT_SUGGESTION ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test. - Update UT. Closes #42109 from panbingkun/SPARK-44404. Lead-authored-by: panbingkun Co-authored-by: panbingkun <84731...@qq.com> Signed-off-by: Max Gekk --- R/pkg/tests/fulltests/test_sparkSQL.R | 3 +- .../src/main/resources/error/error-classes.json| 91 --- ...ions-unsupported-table-operation-error-class.md | 36 +++ ...-unsupported-temp-view-operation-error-class.md | 36 +++ ...tions-unsupported-view-operation-error-class.md | 36 +++ docs/sql-error-conditions.md | 30 +++ .../spark/sql/catalyst/analysis/Analyzer.scala | 9 +- .../sql/catalyst/analysis/v2ResolutionPlans.scala | 4 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 32 ++- .../spark/sql/errors/QueryCompilationErrors.scala | 90 --- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 104 .../apache/spark/sql/execution/command/views.scala | 2 +- .../apache/spark/sql/internal/CatalogImpl.scala| 2 +- .../analyzer-results/change-column.sql.out | 16 +- .../sql-tests/results/change-column.sql.out| 16 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 7 +- .../apache/spark/sql/execution/SQLViewSuite.scala | 267 ++--- .../spark/sql/execution/SQLViewTestSuite.scala | 23 +- .../AlterTableAddPartitionParserSuite.scala| 4 +- .../AlterTableDropPartitionParserSuite.scala | 8 +- .../AlterTableRecoverPartitionsParserSuite.scala | 8 +- .../AlterTableRenamePartitionParserSuite.scala | 4 +- .../command/AlterTableSetLocationParserSuite.scala | 6 +- .../command/AlterTableSetSerdeParserSuite.scala| 16 +- .../spark/sql/execution/command/DDLSuite.scala | 36 ++- .../command/MsckRepairTableParserSuite.scala | 13 +- .../command/ShowPartitionsParserSuite.scala| 10 +- .../command/TruncateTableParserSuite.scala | 6 +- .../execution/command/TruncateTableSuiteBase.scala | 45 +++- .../execution/command/v1/ShowPartitionsSuite.scala | 57 - .../apache/spark/sql/internal/CatalogSuite.scala | 13 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 94 +++- 32 files changed, 717 insertions(+), 407 deletions(-) diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R index d61501d248a..47688d7560c 100644 --- a/R/pkg/tests/fulltests/test_sparkSQL.R +++ b/R/pkg/tests/fulltests/test_sparkSQL.R @@ -4193,8 +4193,7 @@ test_that("catalog APIs, listTables, getTable, listColumns, listFunctions, funct # recoverPartitions does not work with temporary view expect_error(recoverPartitions("cars"), - paste("Error in recoverPartitions : analysis error - cars is a temp view.", - "'recoverPartitions()' expects a table"), fixed = TRUE) + "[UNSUPPORTED_TEMP_VIEW_OPERATION.WITH_SUGGESTION]*`cars`*") expect_error(refreshTable("cars"), NA) expect_error(refreshByPath("/"), NA) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 133c2dd826c..08f79bcecbb 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -3394,12 +3394,63 @@
[spark] branch master updated: [SPARK-43780][SQL][FOLLOWUP] Fix the config doc `spark.sql.optimizer.decorrelateJoinPredicate.enabled`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 24293cab2de [SPARK-43780][SQL][FOLLOWUP] Fix the config doc `spark.sql.optimizer.decorrelateJoinPredicate.enabled` 24293cab2de is described below commit 24293cab2de06a50ffd9f4871073e75481665bb8 Author: Max Gekk AuthorDate: Tue Aug 22 15:32:32 2023 +0300 [SPARK-43780][SQL][FOLLOWUP] Fix the config doc `spark.sql.optimizer.decorrelateJoinPredicate.enabled` ### What changes were proposed in this pull request? Add s" to the doc of the SQL config `spark.sql.optimizer.decorrelateJoinPredicate.enabled`. ### Why are the changes needed? To output the desired config name. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running CI. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42607 from MaxGekk/followup-agubichev_spark-43780-corr-predicate. Authored-by: Max Gekk Signed-off-by: Max Gekk --- sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 9b421251cf6..ca155683ec0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -4363,7 +4363,7 @@ object SQLConf { .internal() .doc("Decorrelate scalar and lateral subqueries with correlated references in join " + "predicates. This configuration is only effective when " + -"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.") +s"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.") .version("4.0.0") .booleanConf .createWithDefault(true) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-44871][SQL][3.4] Fix percentile_disc behaviour
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 0060279f733 [SPARK-44871][SQL][3.4] Fix percentile_disc behaviour 0060279f733 is described below commit 0060279f733989b03aca2bbb0624dfc0c3193aae Author: Peter Toth AuthorDate: Tue Aug 22 19:27:15 2023 +0300 [SPARK-44871][SQL][3.4] Fix percentile_disc behaviour ### What changes were proposed in this pull request? This PR fixes `percentile_disc()` function as currently it returns inforrect results in some cases. E.g.: ``` SELECT percentile_disc(0.0) WITHIN GROUP (ORDER BY a) as p0, percentile_disc(0.1) WITHIN GROUP (ORDER BY a) as p1, percentile_disc(0.2) WITHIN GROUP (ORDER BY a) as p2, percentile_disc(0.3) WITHIN GROUP (ORDER BY a) as p3, percentile_disc(0.4) WITHIN GROUP (ORDER BY a) as p4, percentile_disc(0.5) WITHIN GROUP (ORDER BY a) as p5, percentile_disc(0.6) WITHIN GROUP (ORDER BY a) as p6, percentile_disc(0.7) WITHIN GROUP (ORDER BY a) as p7, percentile_disc(0.8) WITHIN GROUP (ORDER BY a) as p8, percentile_disc(0.9) WITHIN GROUP (ORDER BY a) as p9, percentile_disc(1.0) WITHIN GROUP (ORDER BY a) as p10 FROM VALUES (0), (1), (2), (3), (4) AS v(a) ``` currently returns: ``` +---+---+---+---+---+---+---+---+---+---+---+ | p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10| +---+---+---+---+---+---+---+---+---+---+---+ |0.0|0.0|0.0|1.0|1.0|2.0|2.0|2.0|3.0|3.0|4.0| +---+---+---+---+---+---+---+---+---+---+---+ ``` but after this PR it returns the correct: ``` +---+---+---+---+---+---+---+---+---+---+---+ | p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10| +---+---+---+---+---+---+---+---+---+---+---+ |0.0|0.0|0.0|1.0|1.0|2.0|2.0|3.0|3.0|4.0|4.0| +---+---+---+---+---+---+---+---+---+---+---+ ``` ### Why are the changes needed? Bugfix. ### Does this PR introduce _any_ user-facing change? Yes, fixes a correctness bug, but the old behaviour can be restored with `spark.sql.legacy.percentileDiscCalculation=true`. ### How was this patch tested? Added new UTs. Closes #42610 from peter-toth/SPARK-44871-fix-percentile-disc-behaviour-3.4. Authored-by: Peter Toth Signed-off-by: Max Gekk --- .../expressions/aggregate/percentiles.scala| 39 +-- .../org/apache/spark/sql/internal/SQLConf.scala| 10 ++ .../resources/sql-tests/inputs/percentiles.sql | 77 +- .../sql-tests/results/percentiles.sql.out | 116 + 4 files changed, 234 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala index 8447a5f9b51..da04c5a1c8a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala @@ -27,6 +27,7 @@ import org.apache.spark.sql.catalyst.expressions.Cast._ import org.apache.spark.sql.catalyst.trees.{BinaryLike, TernaryLike, UnaryLike} import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.errors.QueryExecutionErrors +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.sql.types.TypeCollection.NumericAndAnsiInterval import org.apache.spark.util.collection.OpenHashMap @@ -168,11 +169,8 @@ abstract class PercentileBase val accumulatedCounts = sortedCounts.scanLeft((sortedCounts.head._1, 0L)) { case ((key1, count1), (key2, count2)) => (key2, count1 + count2) }.tail -val maxPosition = accumulatedCounts.last._2 - 1 -percentages.map { percentile => - getPercentile(accumulatedCounts, maxPosition * percentile) -} +percentages.map(getPercentile(accumulatedCounts, _)) } private def generateOutput(percentiles: Seq[Double]): Any = { @@ -195,8 +193,11 @@ abstract class PercentileBase * This function has been based upon similar function from HIVE * `org.apache.hadoop.hive.ql.udf.UDAFPercentile.getPercentile()`. */ - private def getPercentile( - accumulatedCounts: Seq[(AnyRef, Long)], position: Double): Double = { + protected def getPercentile( + accumulatedCounts: Seq[(AnyRef, Long)], + percentile: Double): Double = { +val position = (accumulatedCounts.last._2 - 1) * percentile + // We may need to do linear interpolation to get the exact percentile val lower = position.floor.toLong val higher = position.ceil.toLong @@ -219,6 +220,7 @@ abstract class PercentileBase } if (discrete) { +
[spark] branch master updated: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ce50a563d31 [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes ce50a563d31 is described below commit ce50a563d311ccfe36d1fcc4f0743e4e4d7d8116 Author: Max Gekk AuthorDate: Tue Aug 22 21:04:32 2023 +0300 [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes ### What changes were proposed in this pull request? In the PR, I propose to make the `array_insert` function 1-based for negative indexes. So, the maximum index is -1 should point out to the last element, and the function should insert new element at the end of the given array for the index -1. The old behaviour can be restored via the SQL config `spark.sql.legacy.negativeIndexInArrayInsert`. ### Why are the changes needed? 1. To match the behaviour of functions such as `substr()` and `element_at()`. ```sql spark-sql (default)> select element_at(array('a', 'b'), -1), substr('ab', -1); b b ``` 2. To fix an inconsistency in `array_insert` in which positive indexes are 1-based, but negative indexes are 0-based. ### Does this PR introduce _any_ user-facing change? Yes. Before: ```sql spark-sql (default)> select array_insert(array('a', 'b'), -1, 'c'); ["a","c","b"] ``` After: ```sql spark-sql (default)> select array_insert(array('a', 'b'), -1, 'c'); ["a","b","c"] ``` ### How was this patch tested? By running the modified test suite: ``` $ build/sbt "test:testOnly *CollectionExpressionsSuite" $ build/sbt "test:testOnly *DataFrameFunctionsSuite" $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" ``` Closes #42564 from MaxGekk/fix-array_insert. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../explain-results/function_array_insert.explain | 2 +- .../explain-results/function_array_prepend.explain | 2 +- docs/sql-migration-guide.md| 1 + python/pyspark/sql/functions.py| 2 +- .../expressions/collectionOperations.scala | 37 ++-- .../org/apache/spark/sql/internal/SQLConf.scala| 16 +++ .../expressions/CollectionExpressionsSuite.scala | 50 +++--- .../scala/org/apache/spark/sql/functions.scala | 2 +- .../sql-tests/analyzer-results/ansi/array.sql.out | 44 +++ .../sql-tests/analyzer-results/array.sql.out | 44 +++ .../src/test/resources/sql-tests/inputs/array.sql | 5 +++ .../resources/sql-tests/results/ansi/array.sql.out | 34 ++- .../test/resources/sql-tests/results/array.sql.out | 34 ++- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 6 ++- 14 files changed, 218 insertions(+), 61 deletions(-) diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_insert.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_insert.explain index edcd790596b..f5096a363a3 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_insert.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_insert.explain @@ -1,2 +1,2 @@ -Project [array_insert(e#0, 0, 1) AS array_insert(e, 0, 1)#0] +Project [array_insert(e#0, 0, 1, false) AS array_insert(e, 0, 1)#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_prepend.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_prepend.explain index 4c3e7c85d64..1b20682b09d 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_prepend.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_array_prepend.explain @@ -1,2 +1,2 @@ -Project [array_insert(e#0, 1, 1) AS array_prepend(e, 1)#0] +Project [array_insert(e#0, 1, 1, false) AS array_prepend(e, 1)#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index c71b16cd8d6..5fc323ec1b0 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -29,6 +29,7 @@ license: | - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`. - Since Spark 3.5, the `plan` field is moved from
[spark] branch branch-3.3 updated (352810b2b45 -> aa6f6f74dc9)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git from 352810b2b45 [SPARK-44920][CORE] Use await() instead of awaitUninterruptibly() in TransportClientFactory.createClient() add aa6f6f74dc9 [SPARK-44871][SQL][3.3] Fix percentile_disc behaviour No new revisions were added by this update. Summary of changes: .../expressions/aggregate/percentiles.scala| 39 +-- .../org/apache/spark/sql/internal/SQLConf.scala| 10 ++ .../resources/sql-tests/inputs/percentiles.sql | 74 + .../sql-tests/results/percentiles.sql.out | 118 + 4 files changed, 234 insertions(+), 7 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/percentiles.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/percentiles.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44975][SQL] Remove BinaryArithmetic useless override resolved
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 04339e30dbd [SPARK-44975][SQL] Remove BinaryArithmetic useless override resolved 04339e30dbd is described below commit 04339e30dbdda2805edbac7e1e3cd8dfb5c3c608 Author: Jia Fan AuthorDate: Sat Aug 26 21:11:20 2023 +0300 [SPARK-44975][SQL] Remove BinaryArithmetic useless override resolved ### What changes were proposed in this pull request? Remove `BinaryArithmetic` useless override resolved, it is exactly the same as the abstract class `Expression` ### Why are the changes needed? remove useless logic ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? exist test ### Was this patch authored or co-authored using generative AI tooling? Closes #42689 from Hisoka-X/SPARK-44975_remove_resolved_override. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala| 2 -- 1 file changed, 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index 31d4d71cd40..2d9bccc0854 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -264,8 +264,6 @@ abstract class BinaryArithmetic extends BinaryOperator final override val nodePatterns: Seq[TreePattern] = Seq(BINARY_ARITHMETIC) - override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess - override def initQueryContext(): Option[SQLQueryContext] = { if (failOnError) { Some(origin.context) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44983][SQL] Convert binary to string by `to_char` for the formats: `hex`, `base64`, `utf-8`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4946d025b62 [SPARK-44983][SQL] Convert binary to string by `to_char` for the formats: `hex`, `base64`, `utf-8` 4946d025b62 is described below commit 4946d025b6200ad90dfdfbb1f24526016f810523 Author: Max Gekk AuthorDate: Mon Aug 28 16:55:35 2023 +0300 [SPARK-44983][SQL] Convert binary to string by `to_char` for the formats: `hex`, `base64`, `utf-8` ### What changes were proposed in this pull request? In the PR, I propose to re-use the `Hex`, `Base64` and `Decode` expressions in the `ToCharacter` (the `to_char`/`to_varchar` functions) when the `format` parameter is one of `hex`, `base64` and `utf-8`. ### Why are the changes needed? To make the migration to Spark SQL easier from the systems like: - Snowflake: https://docs.snowflake.com/en/sql-reference/functions/to_char - SAP SQL Anywhere: https://help.sap.com/docs/SAP_SQL_Anywhere/93079d4ba8e44920ae63ffb4def91f5b/81fe51196ce21014b9c6cf43b298.html - Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_CHAR-number.html#GUID-00DA076D-2468-41AB-A3AC-CC78DBA0D9CB - Vertica: https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TO_CHAR.htm ### Does this PR introduce _any_ user-facing change? No. This PR extends existing API. It might be considered as an user-facing change only if user's code depends on errors in the case of wrong formats. ### How was this patch tested? By running new examples: ``` $ build/sbt "sql/test:testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite" ``` and new tests: ``` $ build/sbt "test:testOnly *.StringFunctionsSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42632 from MaxGekk/to_char-binary-2. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 5 ++ ...nditions-invalid-parameter-value-error-class.md | 4 ++ .../expressions/numberFormatExpressions.scala | 28 +++-- .../spark/sql/errors/QueryCompilationErrors.scala | 9 +++ .../apache/spark/sql/StringFunctionsSuite.scala| 69 +++--- 5 files changed, 89 insertions(+), 26 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 632c449b992..53c596c00fc 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -1788,6 +1788,11 @@ "expects a binary value with 16, 24 or 32 bytes, but got bytes." ] }, + "BINARY_FORMAT" : { +"message" : [ + "expects one of binary formats 'base64', 'hex', 'utf-8', but got ." +] + }, "DATETIME_UNIT" : { "message" : [ "expects one of the units without quotes YEAR, QUARTER, MONTH, WEEK, DAY, DAYOFYEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, but got the string literal ." diff --git a/docs/sql-error-conditions-invalid-parameter-value-error-class.md b/docs/sql-error-conditions-invalid-parameter-value-error-class.md index 370e6da3362..96829e564aa 100644 --- a/docs/sql-error-conditions-invalid-parameter-value-error-class.md +++ b/docs/sql-error-conditions-invalid-parameter-value-error-class.md @@ -37,6 +37,10 @@ supports 16-byte CBC IVs and 12-byte GCM IVs, but got `` bytes for expects a binary value with 16, 24 or 32 bytes, but got `` bytes. +## BINARY_FORMAT + +expects one of binary formats 'base64', 'hex', 'utf-8', but got ``. + ## DATETIME_UNIT expects one of the units without quotes YEAR, QUARTER, MONTH, WEEK, DAY, DAYOFYEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, but got the string literal ``. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala index 3a424ac21c5..7875ed8fe20 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala @@ -26,7 +26,7 @@ import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGe import org.apache.spark.sql.catalyst.expressions.codegen.Block.BlockHelper import org.apache.spark.sql.catalyst.util.ToNumberParser import org.apache.spark
[spark] branch master updated: [SPARK-44868][SQL][FOLLOWUP] Invoke the `to_varchar` function in Scala API
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 49da438ece8 [SPARK-44868][SQL][FOLLOWUP] Invoke the `to_varchar` function in Scala API 49da438ece8 is described below commit 49da438ece84391db22f9c56e747d555d9b01969 Author: Max Gekk AuthorDate: Mon Aug 28 20:57:27 2023 +0300 [SPARK-44868][SQL][FOLLOWUP] Invoke the `to_varchar` function in Scala API ### What changes were proposed in this pull request? In the PR, I propose to invoke the `to_varchar` function instead of `to_char` in `to_varchar` of Scala/Java API. ### Why are the changes needed? 1. To show correct function name in error messages and in `explain`. 2. To be consistent to other API: PySpark and the previous Spark SQL version 3.5.0. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the modified test: ``` $ build/sbt "test:testOnly *.StringFunctionsSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42703 from MaxGekk/fix-to_varchar-call. Authored-by: Max Gekk Signed-off-by: Max Gekk --- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 2 +- .../src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala| 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index f6699b66af9..6b474c84cdb 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -4431,7 +4431,7 @@ object functions { * @group string_funcs * @since 3.5.0 */ - def to_varchar(e: Column, format: Column): Column = to_char(e, format) + def to_varchar(e: Column, format: Column): Column = call_function("to_varchar", e, format) /** * Convert string 'e' to a number based on the string format 'format'. diff --git a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala index 12881f4a22a..03b9053c71a 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala @@ -878,7 +878,7 @@ class StringFunctionsSuite extends QueryTest with SharedSparkSession { errorClass = "_LEGACY_ERROR_TEMP_1100", parameters = Map( "argName" -> "format", - "funcName" -> "to_char", + "funcName" -> funcName, "requiredType" -> "string")) checkError( exception = intercept[AnalysisException] { @@ -887,7 +887,7 @@ class StringFunctionsSuite extends QueryTest with SharedSparkSession { errorClass = "INVALID_PARAMETER_VALUE.BINARY_FORMAT", parameters = Map( "parameter" -> "`format`", - "functionName" -> "`to_char`", + "functionName" -> s"`$funcName`", "invalidFormat" -> "'invalid_format'")) checkError( exception = intercept[AnalysisException] { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8505084bc26 -> a7eef211691)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8505084bc26 [SPARK-45003][PYTHON][DOCS] Refine docstring of `asc/desc` add a7eef211691 [SPARK-43438][SQL] Error on missing input columns in `INSERT` No new revisions were added by this update. Summary of changes: .../catalyst/analysis/TableOutputResolver.scala| 15 +- .../spark/sql/execution/datasources/rules.scala| 6 ++- .../spark/sql/ResolveDefaultColumnsSuite.scala | 59 +- .../org/apache/spark/sql/sources/InsertSuite.scala | 18 --- .../org/apache/spark/sql/hive/InsertSuite.scala| 2 +- .../spark/sql/hive/execution/HiveQuerySuite.scala | 6 +-- 6 files changed, 69 insertions(+), 37 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-43438][SQL] Error on missing input columns in `INSERT`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 24bd29cc56a [SPARK-43438][SQL] Error on missing input columns in `INSERT` 24bd29cc56a is described below commit 24bd29cc56a7e12a45d713b5ca0bf2205b80a8f6 Author: Max Gekk AuthorDate: Tue Aug 29 23:04:44 2023 +0300 [SPARK-43438][SQL] Error on missing input columns in `INSERT` ### What changes were proposed in this pull request? In the PR, I propose to raise an error when an user uses V1 `INSERT` without a list of columns, and the number of inserting columns doesn't match to the number of actual table columns. At the moment Spark inserts data successfully in such case after the PR https://github.com/apache/spark/pull/41262 which changed the behaviour of Spark 3.4.x. ### Why are the changes needed? 1. To conform the SQL standard which requires the number of columns must be the same: ![Screenshot 2023-08-07 at 11 01 27 AM](https://github.com/apache/spark/assets/1580697/c55badec-5716-490f-a83a-0bb6b22c84c7) Apparently, the insertion below must not succeed: ```sql spark-sql (default)> CREATE TABLE tabtest(c1 INT, c2 INT); spark-sql (default)> INSERT INTO tabtest SELECT 1; ``` 2. To have the same behaviour as **Spark 3.4**: ```sql spark-sql (default)> INSERT INTO tabtest SELECT 1; `spark_catalog`.`default`.`tabtest` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s). ``` ### Does this PR introduce _any_ user-facing change? Yes. After the changes: ```sql spark-sql (default)> INSERT INTO tabtest SELECT 1; [INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`tabtest`, the reason is not enough data columns: Table columns: `c1`, `c2`. Data columns: `1`. ``` ### How was this patch tested? By running the modified tests: ``` $ build/sbt "test:testOnly *InsertSuite" $ build/sbt "test:testOnly *ResolveDefaultColumnsSuite" $ build/sbt -Phive "test:testOnly *HiveQuerySuite" ``` Closes #42393 from MaxGekk/fix-num-cols-insert. Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit a7eef2116919bd0c1a1b52adaf49de903e8c9c46) Signed-off-by: Max Gekk --- .../catalyst/analysis/TableOutputResolver.scala| 15 +- .../spark/sql/execution/datasources/rules.scala| 6 ++- .../spark/sql/ResolveDefaultColumnsSuite.scala | 59 +- .../org/apache/spark/sql/sources/InsertSuite.scala | 18 --- .../org/apache/spark/sql/hive/InsertSuite.scala| 2 +- .../spark/sql/hive/execution/HiveQuerySuite.scala | 6 +-- 6 files changed, 69 insertions(+), 37 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala index 894cd0b3991..6671836b351 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala @@ -65,22 +65,11 @@ object TableOutputResolver { errors += _, fillDefaultValue = supportColDefaultValue) } else { - // If the target table needs more columns than the input query, fill them with - // the columns' default values, if the `supportColDefaultValue` parameter is true. - val fillDefaultValue = supportColDefaultValue && actualExpectedCols.size > query.output.size - val queryOutputCols = if (fillDefaultValue) { -query.output ++ actualExpectedCols.drop(query.output.size).flatMap { expectedCol => - getDefaultValueExprOrNullLit(expectedCol, conf.useNullsForMissingDefaultColumnValues) -} - } else { -query.output - } - if (actualExpectedCols.size > queryOutputCols.size) { + if (actualExpectedCols.size > query.output.size) { throw QueryCompilationErrors.cannotWriteNotEnoughColumnsToTableError( tableName, actualExpectedCols.map(_.name), query) } - - resolveColumnsByPosition(tableName, queryOutputCols, actualExpectedCols, conf, errors += _) + resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _) } if (errors.nonEmpty) { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala b/sql/core/src/main/scala/org/apache/spark/sql/execut
[spark] branch master updated: [SPARK-44987][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1100`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e72ce91250a [SPARK-44987][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1100` e72ce91250a is described below commit e72ce91250a9a2c40fd5ed55a50dbc46e4e7e46d Author: Max Gekk AuthorDate: Thu Aug 31 22:50:21 2023 +0300 [SPARK-44987][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1100` ### What changes were proposed in this pull request? In the PR, I propose to assign the name `NON_FOLDABLE_ARGUMENT` to the legacy error class `_LEGACY_ERROR_TEMP_1100`, and improve the error message format: make it less restrictive. ### Why are the changes needed? 1. To don't confuse users by slightly restrictive error message about literals. 2. To assign proper name as a part of activity in SPARK-37935 ### Does this PR introduce _any_ user-facing change? No. Only if user's code depends on error class name and message parameters. ### How was this patch tested? By running the modified and affected tests: ``` $ build/sbt "test:testOnly *.StringFunctionsSuite" $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" $ build/sbt "core/testOnly *SparkThrowableSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42737 from MaxGekk/assign-name-_LEGACY_ERROR_TEMP_1100. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 11 --- docs/sql-error-conditions.md | 6 .../catalyst/expressions/datetimeExpressions.scala | 2 +- .../sql/catalyst/expressions/mathExpressions.scala | 4 +-- .../expressions/numberFormatExpressions.scala | 2 +- .../spark/sql/errors/QueryCompilationErrors.scala | 14 + .../ceil-floor-with-scale-param.sql.out| 36 -- .../sql-tests/analyzer-results/extract.sql.out | 18 ++- .../results/ceil-floor-with-scale-param.sql.out| 36 -- .../resources/sql-tests/results/extract.sql.out| 18 ++- .../apache/spark/sql/StringFunctionsSuite.scala| 8 ++--- 11 files changed, 88 insertions(+), 67 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 3b537cc3d9f..af78dd2f9f8 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2215,6 +2215,12 @@ ], "sqlState" : "42607" }, + "NON_FOLDABLE_ARGUMENT" : { +"message" : [ + "The function requires the parameter to be a foldable expression of the type , but the actual argument is a non-foldable." +], +"sqlState" : "22024" + }, "NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION" : { "message" : [ "When there are more than one MATCHED clauses in a MERGE statement, only the last MATCHED clause can omit the condition." @@ -4029,11 +4035,6 @@ "() doesn't support the mode. Acceptable modes are and ." ] }, - "_LEGACY_ERROR_TEMP_1100" : { -"message" : [ - "The '' parameter of function '' needs to be a literal." -] - }, "_LEGACY_ERROR_TEMP_1103" : { "message" : [ "Unsupported component type in arrays." diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 89c27f72ea0..33072f6c440 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -1305,6 +1305,12 @@ Cannot call function `` because named argument references are not It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query. +### NON_FOLDABLE_ARGUMENT + +[SQLSTATE: 22024](sql-error-conditions-sqlstates.html#class-22-data-exception) + +The function `` requires the parameter `` to be a foldable expression of the type ``, but the actual argument is a non-foldable. + ### NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION [SQLSTATE: 42613](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 51ddf2b85f8..30a6bec1868 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
[spark] branch master updated (f2a6c97d718 -> d03ebced0ef)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f2a6c97d718 [SPARK-44876][PYTHON][FOLLOWUP] Fix Arrow-optimized Python UDF to delay wrapping the function with fail_on_stopiteration add d03ebced0ef [SPARK-45060][SQL] Fix an internal error from `to_char()`on `NULL` format No new revisions were added by this update. Summary of changes: common/utils/src/main/resources/error/error-classes.json | 5 + ...error-conditions-invalid-parameter-value-error-class.md | 4 .../sql/catalyst/expressions/numberFormatExpressions.scala | 8 ++-- .../apache/spark/sql/errors/QueryCompilationErrors.scala | 8 .../scala/org/apache/spark/sql/StringFunctionsSuite.scala | 14 ++ 5 files changed, 37 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b0b7835bee2 -> 416207659aa)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b0b7835bee2 [SPARK-45059][CONNECT][PYTHON] Add `try_reflect` functions to Scala and Python add 416207659aa [SPARK-45033][SQL] Support maps by parameterized `sql()` No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/parameters.scala | 15 -- .../org/apache/spark/sql/ParametersSuite.scala | 62 +- 2 files changed, 72 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 637f16e7ff8 [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar` 637f16e7ff8 is described below commit 637f16e7ff88c2aef0e7f29163e13138ff472c1d Author: Max Gekk AuthorDate: Wed Sep 6 08:25:41 2023 +0300 [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar` ### What changes were proposed in this pull request? In the PR, I propose to document the recent changes related to the `format` of the `to_char`/`to_varchar` functions: 1. binary formats added by https://github.com/apache/spark/pull/42632 2. datetime formats introduced by https://github.com/apache/spark/pull/42534 ### Why are the changes needed? To inform users about recent changes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By CI. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42801 from MaxGekk/doc-to_char-api. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../main/scala/org/apache/spark/sql/functions.scala| 18 -- python/pyspark/sql/functions.py| 12 .../main/scala/org/apache/spark/sql/functions.scala| 18 ++ 3 files changed, 46 insertions(+), 2 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala index 527848e95e6..54bf0106956 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala @@ -4280,6 +4280,7 @@ object functions { */ def to_binary(e: Column): Column = Column.fn("to_binary", e) + // scalastyle:off line.size.limit /** * Convert `e` to a string based on the `format`. Throws an exception if the conversion fails. * @@ -4300,13 +4301,20 @@ object functions { * (optional, only allowed once at the beginning or end of the format string). Note that 'S' * prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the * end of the format string; specifies that the result string will be wrapped by angle - * brackets if the input value is negative. + * brackets if the input value is negative. If `e` is a datetime, `format` shall be + * a valid datetime pattern, see https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime + * Patterns. If `e` is a binary, it is converted to a string in one of the formats: + * 'base64': a base 64 string. 'hex': a string in the hexadecimal format. + * 'utf-8': the input binary is decoded to UTF-8 string. * * @group string_funcs * @since 3.5.0 */ + // scalastyle:on line.size.limit def to_char(e: Column, format: Column): Column = Column.fn("to_char", e, format) + // scalastyle:off line.size.limit /** * Convert `e` to a string based on the `format`. Throws an exception if the conversion fails. * @@ -4327,11 +4335,17 @@ object functions { * (optional, only allowed once at the beginning or end of the format string). Note that 'S' * prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the * end of the format string; specifies that the result string will be wrapped by angle - * brackets if the input value is negative. + * brackets if the input value is negative. If `e` is a datetime, `format` shall be + * a valid datetime pattern, see https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime + * Patterns. If `e` is a binary, it is converted to a string in one of the formats: + * 'base64': a base 64 string. 'hex': a string in the hexadecimal format. + * 'utf-8': the input binary is decoded to UTF-8 string. * * @group string_funcs * @since 3.5.0 */ + // scalastyle:on line.size.limit def to_varchar(e: Column, format: Column): Column = Column.fn("to_varchar", e, format) /** diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index 56b436421af..de91cced206 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -10902,6 +10902,12 @@ def to_char(col: "ColumnOrName", format: "ColumnOrName") -> Column: values but 'MI' prints a space. &
[spark] branch master updated: [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 24b29adcf53 [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy 24b29adcf53 is described below commit 24b29adcf53616067a9fa2ca201e3f4d2f54436b Author: Max Gekk AuthorDate: Wed Sep 6 10:32:37 2023 +0300 [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy ### What changes were proposed in this pull request? In the PR, I propose to check the `accuracy` argument is not a NULL in `ApproximatePercentile`. And if it is, throw an `AnalysisException` with new error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`. ### Why are the changes needed? To fix the issue demonstrated by the example: ```sql $ spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL) FROM VALUES (0), (1), (2), (10) AS tab(col); [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *.ApproximatePercentileQuerySuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42817 from MaxGekk/fix-internal-error-in-percentile_approx. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../aggregate/ApproximatePercentile.scala | 7 - .../sql/ApproximatePercentileQuerySuite.scala | 31 ++ 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala index 3c3afc1c7e7..5b44c3fa31b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala @@ -97,7 +97,8 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue + private lazy val accuracyNum = accuracyExpression.eval().asInstanceOf[Number] + private lazy val accuracy: Long = accuracyNum.longValue override def inputTypes: Seq[AbstractDataType] = { // Support NumericType, DateType, TimestampType and TimestampNTZType since their internal types @@ -138,6 +139,10 @@ case class ApproximatePercentile( "inputExpr" -> toSQLExpr(accuracyExpression) ) ) +} else if (accuracyNum == null) { + DataTypeMismatch( +errorSubClass = "UNEXPECTED_NULL", +messageParameters = Map("exprName" -> "accuracy")) } else if (accuracy <= 0 || accuracy > Int.MaxValue) { DataTypeMismatch( errorSubClass = "VALUE_OUT_OF_RANGE", diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 18e8dd6249b..273e8e08fd7 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -339,4 +339,35 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession Row(Period.ofMonths(200).normalized(), null, Duration.ofSeconds(200L))) } } + + test("SPARK-45079: NULL arguments of percentile_approx") { +checkError( + exception = intercept[AnalysisException] { +sql( + """ +|SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL) +|FROM VALUES (0), (1), (2), (10) AS tab(col); +|""".stripMargin).collect() + }, + errorClass = "DATATYPE_MISMATCH.UNEXPECTED_NULL", + parameters = Map( +"exprName" -> "accuracy", +"sqlExpr" -> "\"percentile_approx(col, array(0.5, 0.4, 0.1), NULL)\""), + context = ExpectedContext( +"", "", 8, 57, "percentile_approx(col, array(0.5, 0.4, 0.1), NULL)")) +checkError( + exception = intercept[AnalysisException] {
[spark] branch branch-3.5 updated: [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 9b750e93035 [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy 9b750e93035 is described below commit 9b750e930357eae092420f09ca9366e49dc589e2 Author: Max Gekk AuthorDate: Wed Sep 6 10:32:37 2023 +0300 [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy ### What changes were proposed in this pull request? In the PR, I propose to check the `accuracy` argument is not a NULL in `ApproximatePercentile`. And if it is, throw an `AnalysisException` with new error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`. ### Why are the changes needed? To fix the issue demonstrated by the example: ```sql $ spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL) FROM VALUES (0), (1), (2), (10) AS tab(col); [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *.ApproximatePercentileQuerySuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42817 from MaxGekk/fix-internal-error-in-percentile_approx. Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit 24b29adcf53616067a9fa2ca201e3f4d2f54436b) Signed-off-by: Max Gekk --- .../aggregate/ApproximatePercentile.scala | 7 - .../sql/ApproximatePercentileQuerySuite.scala | 31 ++ 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala index 3c3afc1c7e7..5b44c3fa31b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala @@ -97,7 +97,8 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue + private lazy val accuracyNum = accuracyExpression.eval().asInstanceOf[Number] + private lazy val accuracy: Long = accuracyNum.longValue override def inputTypes: Seq[AbstractDataType] = { // Support NumericType, DateType, TimestampType and TimestampNTZType since their internal types @@ -138,6 +139,10 @@ case class ApproximatePercentile( "inputExpr" -> toSQLExpr(accuracyExpression) ) ) +} else if (accuracyNum == null) { + DataTypeMismatch( +errorSubClass = "UNEXPECTED_NULL", +messageParameters = Map("exprName" -> "accuracy")) } else if (accuracy <= 0 || accuracy > Int.MaxValue) { DataTypeMismatch( errorSubClass = "VALUE_OUT_OF_RANGE", diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 18e8dd6249b..273e8e08fd7 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -339,4 +339,35 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession Row(Period.ofMonths(200).normalized(), null, Duration.ofSeconds(200L))) } } + + test("SPARK-45079: NULL arguments of percentile_approx") { +checkError( + exception = intercept[AnalysisException] { +sql( + """ +|SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL) +|FROM VALUES (0), (1), (2), (10) AS tab(col); +|""".stripMargin).collect() + }, + errorClass = "DATATYPE_MISMATCH.UNEXPECTED_NULL", + parameters = Map( +"exprName" -> "accuracy", +"sqlExpr" -> "\"percentile_approx(col, array(0.5, 0.4, 0.1), NULL)\""), + context = ExpectedContext( +"", "", 8, 57, "percentile_approx(col, array(0.5
[spark] branch branch-3.4 updated: [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new f0b421553bc [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy f0b421553bc is described below commit f0b421553bc1850cc3e8ed5d564da8f6425cd244 Author: Max Gekk AuthorDate: Wed Sep 6 10:32:37 2023 +0300 [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy ### What changes were proposed in this pull request? In the PR, I propose to check the `accuracy` argument is not a NULL in `ApproximatePercentile`. And if it is, throw an `AnalysisException` with new error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`. ### Why are the changes needed? To fix the issue demonstrated by the example: ```sql $ spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL) FROM VALUES (0), (1), (2), (10) AS tab(col); [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *.ApproximatePercentileQuerySuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42817 from MaxGekk/fix-internal-error-in-percentile_approx. Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit 24b29adcf53616067a9fa2ca201e3f4d2f54436b) Signed-off-by: Max Gekk --- .../aggregate/ApproximatePercentile.scala | 7 - .../sql/ApproximatePercentileQuerySuite.scala | 31 ++ 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala index 1499f358ac4..ebf1085c0c1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala @@ -96,7 +96,8 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue + private lazy val accuracyNum = accuracyExpression.eval().asInstanceOf[Number] + private lazy val accuracy: Long = accuracyNum.longValue override def inputTypes: Seq[AbstractDataType] = { // Support NumericType, DateType, TimestampType and TimestampNTZType since their internal types @@ -137,6 +138,10 @@ case class ApproximatePercentile( "inputExpr" -> toSQLExpr(accuracyExpression) ) ) +} else if (accuracyNum == null) { + DataTypeMismatch( +errorSubClass = "UNEXPECTED_NULL", +messageParameters = Map("exprName" -> "accuracy")) } else if (accuracy <= 0 || accuracy > Int.MaxValue) { DataTypeMismatch( errorSubClass = "VALUE_OUT_OF_RANGE", diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 9237c9e9486..8598e92f029 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -337,4 +337,35 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession Row(Period.ofMonths(200).normalized(), null, Duration.ofSeconds(200L))) } } + + test("SPARK-45079: NULL arguments of percentile_approx") { +checkError( + exception = intercept[AnalysisException] { +sql( + """ +|SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL) +|FROM VALUES (0), (1), (2), (10) AS tab(col); +|""".stripMargin).collect() + }, + errorClass = "DATATYPE_MISMATCH.UNEXPECTED_NULL", + parameters = Map( +"exprName" -> "accuracy", +"sqlExpr" -> "\"percentile_approx(col, array(0.5, 0.4, 0.1), NULL)\""), + context = ExpectedContext( +"", "", 8, 57, "percentile_approx(col, array(0.5
[spark] branch branch-3.3 updated: [SPARK-45079][SQL][3.3] Fix an internal error from `percentile_approx()` on `NULL` accuracy
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 5250ed65cf2 [SPARK-45079][SQL][3.3] Fix an internal error from `percentile_approx()` on `NULL` accuracy 5250ed65cf2 is described below commit 5250ed65cf2c70e4b456c96c1006b854f56ef1f2 Author: Max Gekk AuthorDate: Wed Sep 6 18:56:14 2023 +0300 [SPARK-45079][SQL][3.3] Fix an internal error from `percentile_approx()` on `NULL` accuracy ### What changes were proposed in this pull request? In the PR, I propose to check the `accuracy` argument is not a NULL in `ApproximatePercentile`. And if it is, throw an `AnalysisException` with new error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`. This is a backport of https://github.com/apache/spark/pull/42817. ### Why are the changes needed? To fix the issue demonstrated by the example: ```sql $ spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL) FROM VALUES (0), (1), (2), (10) AS tab(col); [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *.ApproximatePercentileQuerySuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Authored-by: Max Gekk (cherry picked from commit 24b29adcf53616067a9fa2ca201e3f4d2f54436b) Closes #42835 from MaxGekk/fix-internal-error-in-percentile_approx-3.3. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../expressions/aggregate/ApproximatePercentile.scala | 5 - .../spark/sql/ApproximatePercentileQuerySuite.scala | 19 +++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala index d8eccc075a2..b816e4a9719 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala @@ -95,7 +95,8 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue + private lazy val accuracyNum = accuracyExpression.eval().asInstanceOf[Number] + private lazy val accuracy: Long = accuracyNum.longValue override def inputTypes: Seq[AbstractDataType] = { // Support NumericType, DateType, TimestampType and TimestampNTZType since their internal types @@ -120,6 +121,8 @@ case class ApproximatePercentile( defaultCheck } else if (!percentageExpression.foldable || !accuracyExpression.foldable) { TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal") +} else if (accuracyNum == null) { + TypeCheckFailure("Accuracy value must not be null") } else if (accuracy <= 0 || accuracy > Int.MaxValue) { TypeCheckFailure(s"The accuracy provided must be a literal between (0, ${Int.MaxValue}]" + s" (current value = $accuracy)") diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 9237c9e9486..3fd1592a107 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -337,4 +337,23 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession Row(Period.ofMonths(200).normalized(), null, Duration.ofSeconds(200L))) } } + + test("SPARK-45079: NULL arguments of percentile_approx") { +val e1 = intercept[AnalysisException] { + sql( +""" + |SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL) + |FROM VALUES (0), (1), (2), (10) AS tab(col); + |""".stripMargin).collect() +} +assert(e1.getMessage.contains("Accuracy value must not be null")) +val e2 = intercept[AnalysisException] { + sql( +""" + |SELECT percentile_approx(col, NULL
[spark] branch master updated (aaf413ce351 -> fd424caf6c4)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from aaf413ce351 [SPARK-44508][PYTHON][DOCS] Add user guide for Python user-defined table functions add fd424caf6c4 [SPARK-45100][SQL] Fix an internal error from `reflect()`on `NULL` class and method No new revisions were added by this update. Summary of changes: .../expressions/CallMethodViaReflection.scala| 8 .../org/apache/spark/sql/MiscFunctionsSuite.scala| 20 2 files changed, 28 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45100][SQL] Fix an internal error from `reflect()`on `NULL` class and method
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 8f730e779ff [SPARK-45100][SQL] Fix an internal error from `reflect()`on `NULL` class and method 8f730e779ff is described below commit 8f730e779ff64773beb20ad633151e866cfff7f2 Author: Max Gekk AuthorDate: Fri Sep 8 11:12:54 2023 +0300 [SPARK-45100][SQL] Fix an internal error from `reflect()`on `NULL` class and method ### What changes were proposed in this pull request? In the PR, I propose to check that the `class` and `method` arguments are not a NULL in `CallMethodViaReflection`. And if they are, throw an `AnalysisException` with new error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`. ### Why are the changes needed? To fix the issue demonstrated by the example: ```sql $ spark-sql (default)> select reflect('java.util.UUID', CAST(NULL AS STRING)); [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *.MiscFunctionsSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42849 from MaxGekk/fix-internal-error-in-reflect. Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit fd424caf6c46e7030ac2deb2afbe3f4a5fc1095c) Signed-off-by: Max Gekk --- .../expressions/CallMethodViaReflection.scala| 8 .../org/apache/spark/sql/MiscFunctionsSuite.scala| 20 2 files changed, 28 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala index 52b057a3276..4511b5b548d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala @@ -78,6 +78,10 @@ case class CallMethodViaReflection(children: Seq[Expression]) "inputExpr" -> toSQLExpr(children.head) ) ) +case (e, 0) if e.eval() == null => + DataTypeMismatch( +errorSubClass = "UNEXPECTED_NULL", +messageParameters = Map("exprName" -> toSQLId("class"))) case (e, 1) if !(e.dataType == StringType && e.foldable) => DataTypeMismatch( errorSubClass = "NON_FOLDABLE_INPUT", @@ -87,6 +91,10 @@ case class CallMethodViaReflection(children: Seq[Expression]) "inputExpr" -> toSQLExpr(children(1)) ) ) +case (e, 1) if e.eval() == null => + DataTypeMismatch( +errorSubClass = "UNEXPECTED_NULL", +messageParameters = Map("exprName" -> toSQLId("method"))) case (e, idx) if idx > 1 && !CallMethodViaReflection.typeMapping.contains(e.dataType) => DataTypeMismatch( errorSubClass = "UNEXPECTED_INPUT_TYPE", diff --git a/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala index 074556fa2f9..b890ae73fb6 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala @@ -232,6 +232,26 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession { Seq(Row("a5cf6c42-0c85-418f-af6c-3e4e5b1328f2"))) checkAnswer(df.select(reflect(lit("java.util.UUID"), lit("fromString"), col("a"))), Seq(Row("a5cf6c42-0c85-418f-af6c-3e4e5b1328f2"))) + +checkError( + exception = intercept[AnalysisException] { +df.selectExpr("reflect(cast(null as string), 'fromString', a)") + }, + errorClass = "DATATYPE_MISMATCH.UNEXPECTED_NULL", + parameters = Map( +"exprName" -> "`class`", +"sqlExpr" -> "\"reflect(CAST(NULL AS STRING), fromString, a)\""), + context = ExpectedContext("", "", 0, 45, "reflect(cast(null as string), 'fromString', a)")) +checkError( + exception = inte
[spark] branch branch-3.3 updated: [SPARK-45100][SQL][3.3] Fix an internal error from `reflect()`on `NULL` class and method
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new a4d40e8a355 [SPARK-45100][SQL][3.3] Fix an internal error from `reflect()`on `NULL` class and method a4d40e8a355 is described below commit a4d40e8a355f451c6340dec0c90a332434433a75 Author: Max Gekk AuthorDate: Fri Sep 8 18:59:22 2023 +0300 [SPARK-45100][SQL][3.3] Fix an internal error from `reflect()`on `NULL` class and method ### What changes were proposed in this pull request? In the PR, I propose to check that the `class` and `method` arguments are not a NULL in `CallMethodViaReflection`. And if they are, throw an `AnalysisException` with new error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`. This is a backport of https://github.com/apache/spark/pull/42849. ### Why are the changes needed? To fix the issue demonstrated by the example: ```sql $ spark-sql (default)> select reflect('java.util.UUID', CAST(NULL AS STRING)); [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *.MiscFunctionsSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Authored-by: Max Gekk (cherry picked from commit fd424caf6c46e7030ac2deb2afbe3f4a5fc1095c) Closes #42856 from MaxGekk/fix-internal-error-in-reflect-3.3. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/CallMethodViaReflection.scala | 2 ++ .../src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala | 8 2 files changed, 10 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala index 7cb830d1156..9764d9db7f0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala @@ -65,6 +65,8 @@ case class CallMethodViaReflection(children: Seq[Expression]) } else if (!children.take(2).forall(e => e.dataType == StringType && e.foldable)) { // The first two arguments must be string type. TypeCheckFailure("first two arguments should be string literals") +} else if (children.take(2).exists(_.eval() == null)) { + TypeCheckFailure("first two arguments must be non-NULL") } else if (!classExists) { TypeCheckFailure(s"class $className not found") } else if (children.slice(2, children.length) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala index 37ba52023dd..18262ccd407 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala @@ -34,6 +34,14 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession { s"reflect('$className', 'method1', a, b)", s"java_method('$className', 'method1', a, b)"), Row("m1one", "m1one")) +val e1 = intercept[AnalysisException] { + df.selectExpr("reflect(cast(null as string), 'fromString', a)") +} +assert(e1.getMessage.contains("first two arguments must be non-NULL")) +val e2 = intercept[AnalysisException] { + df.selectExpr("reflect('java.util.UUID', cast(null as string), a)") +} +assert(e2.getMessage.contains("first two arguments must be non-NULL")) } test("version") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43251][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2015` with an internal error
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c7ea3f7d53d [SPARK-43251][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2015` with an internal error c7ea3f7d53d is described below commit c7ea3f7d53d5a7674f3da0db07018c1f0c43dbf6 Author: dengziming AuthorDate: Mon Sep 11 18:28:31 2023 +0300 [SPARK-43251][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2015` with an internal error ### What changes were proposed in this pull request? Replace the legacy error class `_LEGACY_ERROR_TEMP_2015` with an internal error as it is not triggered by the user space. ### Why are the changes needed? As the error is not triggered by the user space, the legacy error class can be replaced by an internal error. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42845 from dengziming/SPARK-43251. Authored-by: dengziming Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/error-classes.json | 5 - .../scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 9 +++-- 2 files changed, 3 insertions(+), 11 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 2954d8b9338..282af8c199d 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -4944,11 +4944,6 @@ "Negative values found in " ] }, - "_LEGACY_ERROR_TEMP_2015" : { -"message" : [ - "Cannot generate code for incomparable type: ." -] - }, "_LEGACY_ERROR_TEMP_2016" : { "message" : [ "Can not interpolate into code block." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 2d655be0e70..417ba38c66f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -405,12 +405,9 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE } def cannotGenerateCodeForIncomparableTypeError( - codeType: String, dataType: DataType): SparkIllegalArgumentException = { -new SparkIllegalArgumentException( - errorClass = "_LEGACY_ERROR_TEMP_2015", - messageParameters = Map( -"codeType" -> codeType, -"dataType" -> dataType.catalogString)) + codeType: String, dataType: DataType): Throwable = { +SparkException.internalError( + s"Cannot generate $codeType code for incomparable type: ${toSQLType(dataType)}.") } def cannotInterpolateClassIntoCodeBlockError(arg: Any): SparkIllegalArgumentException = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fa2bc21ba1e -> 6565ae47cae)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from fa2bc21ba1e [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3 add 6565ae47cae [SPARK-43252][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2016` with an internal error No new revisions were added by this update. Summary of changes: common/utils/src/main/resources/error/error-classes.json| 5 - .../org/apache/spark/sql/errors/QueryExecutionErrors.scala | 6 ++ .../sql/catalyst/expressions/codegen/CodeBlockSuite.scala | 13 - 3 files changed, 10 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6565ae47cae -> d8129f837c4)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6565ae47cae [SPARK-43252][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2016` with an internal error add d8129f837c4 [SPARK-45085][SQL] Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic No new revisions were added by this update. Summary of changes: R/pkg/tests/fulltests/test_sparkSQL.R | 2 +- .../src/main/resources/error/error-classes.json| 17 -- docs/sql-error-conditions.md | 8 --- .../spark/sql/catalyst/analysis/Analyzer.scala | 19 +++ .../sql/catalyst/analysis/v2ResolutionPlans.scala | 4 +- .../spark/sql/errors/QueryCompilationErrors.scala | 52 +++-- .../analyzer-results/change-column.sql.out | 8 +-- .../sql-tests/results/change-column.sql.out| 8 +-- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 4 +- .../apache/spark/sql/execution/SQLViewSuite.scala | 66 +++--- .../spark/sql/execution/SQLViewTestSuite.scala | 4 +- .../spark/sql/execution/command/DDLSuite.scala | 6 +- .../execution/command/TruncateTableSuiteBase.scala | 10 ++-- .../execution/command/v1/ShowPartitionsSuite.scala | 10 ++-- .../apache/spark/sql/internal/CatalogSuite.scala | 4 +- 15 files changed, 80 insertions(+), 142 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44911][SQL] Create hive table with invalid column should return error class
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1e03db36a93 [SPARK-44911][SQL] Create hive table with invalid column should return error class 1e03db36a93 is described below commit 1e03db36a939aea5b4d55059967ccde96cb29564 Author: ming95 <505306...@qq.com> AuthorDate: Tue Sep 12 11:55:08 2023 +0300 [SPARK-44911][SQL] Create hive table with invalid column should return error class ### What changes were proposed in this pull request? create hive table with invalid column should return error class. run sql ``` create table test stored as parquet as select id, date'2018-01-01' + make_dt_interval(0, id) from range(0, 10) ``` before this issue , error would be : ``` org.apache.spark.sql.AnalysisException: Cannot create a table having a column whose name contains commas in Hive metastore. Table: `spark_catalog`.`default`.`test`; Column: DATE '2018-01-01' + make_dt_interval(0, id, 0, 0.00) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4(HiveExternalCatalog.scala:175) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4$adapted(HiveExternalCatalog.scala:171) at scala.collection.Iterator.foreach(Iterator.scala:943) ``` after this issue ``` Exception in thread "main" org.apache.spark.sql.AnalysisException: [INVALID_HIVE_COLUMN_NAME] Cannot create the table `spark_catalog`.`default`.`parquet_ds1` having the column `DATE '2018-01-01' + make_dt_interval(0, id, 0, 0`.`00)` whose name contains invalid characters ',' in Hive metastore. at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4(HiveExternalCatalog.scala:180) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4$adapted(HiveExternalCatalog.scala:171) at scala.collection.Iterator.foreach(Iterator.scala:943) ``` ### Why are the changes needed? as above ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? add UT ### Was this patch authored or co-authored using generative AI tooling? no Closes #42609 from ming95/SPARK-44911. Authored-by: ming95 <505306...@qq.com> Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 2 +- docs/sql-error-conditions.md | 2 +- .../spark/sql/hive/HiveExternalCatalog.scala | 11 --- .../spark/sql/hive/execution/HiveDDLSuite.scala| 21 .../spark/sql/hive/execution/SQLQuerySuite.scala | 23 +++--- 5 files changed, 47 insertions(+), 12 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 415bdbaf42a..4740ed72f89 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -1587,7 +1587,7 @@ }, "INVALID_HIVE_COLUMN_NAME" : { "message" : [ - "Cannot create the table having the nested column whose name contains invalid characters in Hive metastore." + "Cannot create the table having the column whose name contains invalid characters in Hive metastore." ] }, "INVALID_IDENTIFIER" : { diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 0d54938593c..444c2b7c0d1 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -971,7 +971,7 @@ For more details see [INVALID_HANDLE](sql-error-conditions-invalid-handle-error- SQLSTATE: none assigned -Cannot create the table `` having the nested column `` whose name contains invalid characters `` in Hive metastore. +Cannot create the table `` having the column `` whose name contains invalid characters `` in Hive metastore. ### INVALID_IDENTIFIER diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala index e4325989b70..67292460bbc 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala @@ -42,7 +42,7 @@ import org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils._ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.types.DataTypeUtils import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, CharVarcharUtils} -import or
[spark] branch master updated: [SPARK-45162][SQL] Support maps and array parameters constructed via `call_function`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cd672b09ac6 [SPARK-45162][SQL] Support maps and array parameters constructed via `call_function` cd672b09ac6 is described below commit cd672b09ac69724cd99dc12c9bb49dd117025be1 Author: Max Gekk AuthorDate: Thu Sep 14 11:31:56 2023 +0300 [SPARK-45162][SQL] Support maps and array parameters constructed via `call_function` ### What changes were proposed in this pull request? In the PR, I propose to move the `BindParameters` rules from the `Substitution` to the `Resolution` batch, and change types of the `args` parameter of `NameParameterizedQuery` and `PosParameterizedQuery` to an `Iterable` to resolve argument expressions. ### Why are the changes needed? After the PR, the parameterized `sql()` allows map/array/struct constructed by functions like `map()`, `array()`, and `struct()`, but the same functions invoked via `call_function` are not supported: ```scala scala> sql("SELECT element_at(:mapParam, 'a')", Map("mapParam" -> call_function("map", lit("a"), lit(1 org.apache.spark.sql.catalyst.ExtendedAnalysisException: [UNBOUND_SQL_PARAMETER] Found the unbound parameter: mapParam. Please, fix `args` and provide a mapping of the parameter to a SQL literal.; line 1 pos 18; ``` ### Does this PR introduce _any_ user-facing change? No, should not since it fixes an issue. Only if user code depends on the error message. After the changes: ```scala scala> sql("SELECT element_at(:mapParam, 'a')", Map("mapParam" -> call_function("map", lit("a"), lit(1.show(false) ++ |element_at(map(a, 1), a)| ++ |1 | ++ ``` ### How was this patch tested? By running new tests: ``` $ build/sbt "test:testOnly *ParametersSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42894 from MaxGekk/fix-parameterized-sql-unresolved. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../sql/connect/planner/SparkConnectPlanner.scala | 2 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../spark/sql/catalyst/analysis/parameters.scala | 28 +- .../sql/catalyst/analysis/AnalysisSuite.scala | 4 ++-- .../org/apache/spark/sql/ParametersSuite.scala | 19 --- 5 files changed, 42 insertions(+), 13 deletions(-) diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index 24dee006f0b..74a8ff290eb 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -269,7 +269,7 @@ class SparkConnectPlanner(val sessionHolder: SessionHolder) extends Logging { if (!args.isEmpty) { NameParameterizedQuery(parsedPlan, args.asScala.mapValues(transformLiteral).toMap) } else if (!posArgs.isEmpty) { - PosParameterizedQuery(parsedPlan, posArgs.asScala.map(transformLiteral).toArray) + PosParameterizedQuery(parsedPlan, posArgs.asScala.map(transformLiteral).toSeq) } else { parsedPlan } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index e15b9730111..6491a4eea95 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -260,7 +260,6 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor // at the beginning of analysis. OptimizeUpdateFields, CTESubstitution, - BindParameters, WindowsSubstitution, EliminateUnions, SubstituteUnresolvedOrdinals), @@ -322,6 +321,7 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor RewriteDeleteFromTable :: RewriteUpdateTable :: RewriteMergeIntoTable :: + BindParameters :: typeCoercionRules ++ Seq( ResolveWithCTE, diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala index 13404797490
[spark] branch master updated (cd672b09ac6 -> 6653f94d489)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cd672b09ac6 [SPARK-45162][SQL] Support maps and array parameters constructed via `call_function` add 6653f94d489 [SPARK-45156][SQL] Wrap `inputName` by backticks in the `NON_FOLDABLE_INPUT` error class No new revisions were added by this update. Summary of changes: .../expressions/CallMethodViaReflection.scala | 4 ++-- .../aggregate/ApproxCountDistinctForIntervals.scala | 2 +- .../expressions/aggregate/ApproximatePercentile.scala | 4 ++-- .../expressions/aggregate/BloomFilterAggregate.scala | 4 ++-- .../expressions/aggregate/CountMinSketchAgg.scala | 6 +++--- .../expressions/aggregate/HistogramNumeric.scala | 2 +- .../catalyst/expressions/aggregate/percentiles.scala | 2 +- .../sql/catalyst/expressions/csvExpressions.scala | 2 +- .../spark/sql/catalyst/expressions/generators.scala | 2 +- .../sql/catalyst/expressions/jsonExpressions.scala| 2 +- .../sql/catalyst/expressions/maskExpressions.scala| 2 +- .../sql/catalyst/expressions/mathExpressions.scala| 2 +- .../sql/catalyst/expressions/regexpExpressions.scala | 2 +- .../sql/catalyst/expressions/stringExpressions.scala | 2 +- .../sql/catalyst/expressions/windowExpressions.scala | 19 +++ .../spark/sql/catalyst/expressions/xml/xpath.scala| 2 +- .../sql/catalyst/expressions/xmlExpressions.scala | 2 +- .../analysis/ExpressionTypeCheckingSuite.scala| 6 +++--- .../expressions/CallMethodViaReflectionSuite.scala| 2 +- .../catalyst/expressions/RegexpExpressionsSuite.scala | 2 +- .../catalyst/expressions/StringExpressionsSuite.scala | 6 +++--- .../ApproxCountDistinctForIntervalsSuite.scala| 2 +- .../aggregate/ApproximatePercentileSuite.scala| 4 ++-- .../aggregate/CountMinSketchAggSuite.scala| 6 +++--- .../expressions/aggregate/HistogramNumericSuite.scala | 2 +- .../expressions/aggregate/PercentileSuite.scala | 2 +- .../expressions/xml/XPathExpressionSuite.scala| 2 +- .../analyzer-results/ansi/string-functions.sql.out| 2 +- .../sql-tests/analyzer-results/csv-functions.sql.out | 2 +- .../sql-tests/analyzer-results/join-lateral.sql.out | 2 +- .../sql-tests/analyzer-results/json-functions.sql.out | 2 +- .../sql-tests/analyzer-results/mask-functions.sql.out | 4 ++-- .../sql-tests/analyzer-results/percentiles.sql.out| 2 +- .../analyzer-results/string-functions.sql.out | 2 +- .../sql-tests/results/ansi/string-functions.sql.out | 2 +- .../resources/sql-tests/results/csv-functions.sql.out | 2 +- .../resources/sql-tests/results/join-lateral.sql.out | 2 +- .../sql-tests/results/json-functions.sql.out | 2 +- .../sql-tests/results/mask-functions.sql.out | 4 ++-- .../resources/sql-tests/results/percentiles.sql.out | 2 +- .../sql-tests/results/string-functions.sql.out| 2 +- .../spark/sql/DataFrameWindowFunctionsSuite.scala | 2 +- .../org/apache/spark/sql/GeneratorFunctionSuite.scala | 2 +- 43 files changed, 67 insertions(+), 64 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e84c66db60c [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work e84c66db60c is described below commit e84c66db60c78476806161479344cd32a7606ab1 Author: Jia Fan AuthorDate: Sun Sep 17 11:16:24 2023 +0300 [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work ### What changes were proposed in this pull request? This PR fix call `array_insert` with different type between array and insert column, will throw exception. Sometimes it should be execute successed. eg: ```sql select array_insert(array(1), 2, cast(2 as tinyint)) ``` The `ImplicitCastInputTypes` in `ArrayInsert` always return empty array at now. So that Spark can not convert `tinyint` to `int`. ### Why are the changes needed? Fix error behavior in `array_insert` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new test. ### Was this patch authored or co-authored using generative AI tooling? No Closes #42951 from Hisoka-X/SPARK-45078_arrayinsert_type_mismatch. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/collectionOperations.scala | 1 - .../test/resources/sql-tests/analyzer-results/ansi/array.sql.out | 7 +++ .../src/test/resources/sql-tests/analyzer-results/array.sql.out | 7 +++ sql/core/src/test/resources/sql-tests/inputs/array.sql| 1 + sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out | 8 sql/core/src/test/resources/sql-tests/results/array.sql.out | 8 6 files changed, 31 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 957aa1ab2d5..9c9127efb17 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -4749,7 +4749,6 @@ case class ArrayInsert( } case (e1, e2, e3) => Seq.empty } -Seq.empty } override def checkInputDataTypes(): TypeCheckResult = { diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out index cd101c7a524..6fc30815793 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out @@ -531,6 +531,13 @@ Project [array_insert(array(2, 3, cast(null as int), 4), -5, 1, false) AS array_ +- OneRowRelation +-- !query +select array_insert(array(1), 2, cast(2 as tinyint)) +-- !query analysis +Project [array_insert(array(1), 2, cast(cast(2 as tinyint) as int), false) AS array_insert(array(1), 2, CAST(2 AS TINYINT))#x] ++- OneRowRelation + + -- !query set spark.sql.legacy.negativeIndexInArrayInsert=true -- !query analysis diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out index 8279fb3362e..e0585b77cb6 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out @@ -531,6 +531,13 @@ Project [array_insert(array(2, 3, cast(null as int), 4), -5, 1, false) AS array_ +- OneRowRelation +-- !query +select array_insert(array(1), 2, cast(2 as tinyint)) +-- !query analysis +Project [array_insert(array(1), 2, cast(cast(2 as tinyint) as int), false) AS array_insert(array(1), 2, CAST(2 AS TINYINT))#x] ++- OneRowRelation + + -- !query set spark.sql.legacy.negativeIndexInArrayInsert=true -- !query analysis diff --git a/sql/core/src/test/resources/sql-tests/inputs/array.sql b/sql/core/src/test/resources/sql-tests/inputs/array.sql index 48edc6b4742..52a0906ea73 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/array.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/array.sql @@ -141,6 +141,7 @@ select array_insert(array(1, 2, 3, NULL), cast(NULL as INT), 4); select array_insert(array(1, 2, 3, NULL), 4, cast(NULL as INT)); select array_insert(array(2, 3, NULL, 4), 5, 5); select array_insert(array(2, 3, NULL, 4), -5, 1); +select array_insert(array(1), 2, cast(2 as tinyint)); set spark.sql.legacy.negativeIndexInArrayInsert=true; select array_insert(array(1, 3, 4), -2, 2); diff --git a/sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out b/sql/core/src/test/resources/sql-te
[spark] branch branch-3.5 updated: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 723a85eb2df [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work 723a85eb2df is described below commit 723a85eb2dffa69571cba841380eb759a9b89321 Author: Jia Fan AuthorDate: Sun Sep 17 11:16:24 2023 +0300 [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work ### What changes were proposed in this pull request? This PR fix call `array_insert` with different type between array and insert column, will throw exception. Sometimes it should be execute successed. eg: ```sql select array_insert(array(1), 2, cast(2 as tinyint)) ``` The `ImplicitCastInputTypes` in `ArrayInsert` always return empty array at now. So that Spark can not convert `tinyint` to `int`. ### Why are the changes needed? Fix error behavior in `array_insert` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new test. ### Was this patch authored or co-authored using generative AI tooling? No Closes #42951 from Hisoka-X/SPARK-45078_arrayinsert_type_mismatch. Authored-by: Jia Fan Signed-off-by: Max Gekk (cherry picked from commit e84c66db60c78476806161479344cd32a7606ab1) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/collectionOperations.scala | 1 - .../test/resources/sql-tests/analyzer-results/ansi/array.sql.out | 7 +++ .../src/test/resources/sql-tests/analyzer-results/array.sql.out | 7 +++ sql/core/src/test/resources/sql-tests/inputs/array.sql| 1 + sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out | 8 sql/core/src/test/resources/sql-tests/results/array.sql.out | 8 6 files changed, 31 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index fe9c4015c15..ade4a6c5be7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -4711,7 +4711,6 @@ case class ArrayInsert( } case (e1, e2, e3) => Seq.empty } -Seq.empty } override def checkInputDataTypes(): TypeCheckResult = { diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out index cd101c7a524..6fc30815793 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/array.sql.out @@ -531,6 +531,13 @@ Project [array_insert(array(2, 3, cast(null as int), 4), -5, 1, false) AS array_ +- OneRowRelation +-- !query +select array_insert(array(1), 2, cast(2 as tinyint)) +-- !query analysis +Project [array_insert(array(1), 2, cast(cast(2 as tinyint) as int), false) AS array_insert(array(1), 2, CAST(2 AS TINYINT))#x] ++- OneRowRelation + + -- !query set spark.sql.legacy.negativeIndexInArrayInsert=true -- !query analysis diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out index 8279fb3362e..e0585b77cb6 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out @@ -531,6 +531,13 @@ Project [array_insert(array(2, 3, cast(null as int), 4), -5, 1, false) AS array_ +- OneRowRelation +-- !query +select array_insert(array(1), 2, cast(2 as tinyint)) +-- !query analysis +Project [array_insert(array(1), 2, cast(cast(2 as tinyint) as int), false) AS array_insert(array(1), 2, CAST(2 AS TINYINT))#x] ++- OneRowRelation + + -- !query set spark.sql.legacy.negativeIndexInArrayInsert=true -- !query analysis diff --git a/sql/core/src/test/resources/sql-tests/inputs/array.sql b/sql/core/src/test/resources/sql-tests/inputs/array.sql index 48edc6b4742..52a0906ea73 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/array.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/array.sql @@ -141,6 +141,7 @@ select array_insert(array(1, 2, 3, NULL), cast(NULL as INT), 4); select array_insert(array(1, 2, 3, NULL), 4, cast(NULL as INT)); select array_insert(array(2, 3, NULL, 4), 5, 5); select array_insert(array(2, 3, NULL, 4), -5, 1); +select array_insert(array(1), 2, cast(2 as tinyint)); set spark.sql.legacy.negativeIndexInArrayInsert=true; select array_insert(array(1, 3, 4), -2, 2); diff --gi
[spark] branch master updated: [SPARK-45034][SQL] Support deterministic mode function
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f5365d0dc59 [SPARK-45034][SQL] Support deterministic mode function f5365d0dc59 is described below commit f5365d0dc590d4965a269da223dbd72fbb764595 Author: Peter Toth AuthorDate: Sun Sep 17 21:37:57 2023 +0300 [SPARK-45034][SQL] Support deterministic mode function ### What changes were proposed in this pull request? This PR adds a new optional argument to the `mode` aggregate function to provide deterministic results. When multiple values have the same greatest frequency then the new boolean argument can be used to get the lowest or highest value instead of an arbitraty one. ### Why are the changes needed? To make the function more user friendly. ### Does this PR introduce _any_ user-facing change? Yes, it adds a new argument to the `mode` function. ### How was this patch tested? Added new UTs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42755 from peter-toth/SPARK-45034-deterministic-mode-function. Authored-by: Peter Toth Signed-off-by: Max Gekk --- .../scala/org/apache/spark/sql/functions.scala | 14 ++- .../explain-results/function_mode.explain | 2 +- .../query-tests/queries/function_mode.json | 4 + .../query-tests/queries/function_mode.proto.bin| Bin 173 -> 179 bytes python/pyspark/sql/connect/functions.py| 4 +- python/pyspark/sql/functions.py| 35 -- .../sql/catalyst/expressions/aggregate/Mode.scala | 76 ++-- .../scala/org/apache/spark/sql/functions.scala | 16 ++- .../sql-functions/sql-expression-schema.md | 2 +- .../sql-tests/analyzer-results/group-by.sql.out| 120 ++- .../test/resources/sql-tests/inputs/group-by.sql | 11 ++ .../resources/sql-tests/results/group-by.sql.out | 132 - .../apache/spark/sql/DatasetAggregatorSuite.scala | 10 ++ 13 files changed, 397 insertions(+), 29 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala index b2102d4ba55..83f0ee64501 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala @@ -827,7 +827,19 @@ object functions { * @group agg_funcs * @since 3.4.0 */ - def mode(e: Column): Column = Column.fn("mode", e) + def mode(e: Column): Column = mode(e, deterministic = false) + + /** + * Aggregate function: returns the most frequent value in a group. + * + * When multiple values have the same greatest frequency then either any of values is returned + * if deterministic is false or is not defined, or the lowest value is returned if deterministic + * is true. + * + * @group agg_funcs + * @since 4.0.0 + */ + def mode(e: Column, deterministic: Boolean): Column = Column.fn("mode", e, lit(deterministic)) /** * Aggregate function: returns the maximum value of the expression in a group. diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_mode.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_mode.explain index dfa2113a2c3..28bbb44b0fd 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_mode.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_mode.explain @@ -1,2 +1,2 @@ -Aggregate [mode(a#0, 0, 0) AS mode(a)#0] +Aggregate [mode(a#0, 0, 0, false) AS mode(a, false)#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/connector/connect/common/src/test/resources/query-tests/queries/function_mode.json b/connector/connect/common/src/test/resources/query-tests/queries/function_mode.json index 8e8183e9e08..5c26edee803 100644 --- a/connector/connect/common/src/test/resources/query-tests/queries/function_mode.json +++ b/connector/connect/common/src/test/resources/query-tests/queries/function_mode.json @@ -18,6 +18,10 @@ "unresolvedAttribute": { "unparsedIdentifier": "a" } +}, { + "literal": { +"boolean": false + } }] } }] diff --git a/connector/connect/common/src/test/resources/query-tests/queries/function_mode.proto.bin b/connector/connect/common/src/test/resources/query-tests/queries/function_mode.proto.bin index dca0953a387..cc115e43172 100644 Binary files a/connector/connect/comm
[spark] branch master updated (8d363c6e2c8 -> 0dda75f824d)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8d363c6e2c8 [SPARK-45196][PYTHON][DOCS] Refine docstring of `array/array_contains/arrays_overlap` add 0dda75f824d [SPARK-45137][CONNECT] Support map/array parameters in parameterized `sql()` No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/SparkSession.scala | 6 +- .../org/apache/spark/sql/ClientE2ETestSuite.scala | 7 + .../src/main/protobuf/spark/connect/commands.proto | 12 +- .../main/protobuf/spark/connect/relations.proto| 12 +- .../sql/connect/planner/SparkConnectPlanner.scala | 26 +- python/pyspark/sql/connect/proto/commands_pb2.py | 164 +++-- python/pyspark/sql/connect/proto/commands_pb2.pyi | 60 - python/pyspark/sql/connect/proto/relations_pb2.py | 268 +++-- python/pyspark/sql/connect/proto/relations_pb2.pyi | 60 - 9 files changed, 396 insertions(+), 219 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45188][SQL][DOCS] Update error messages related to parameterized `sql()`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 981312284f0 [SPARK-45188][SQL][DOCS] Update error messages related to parameterized `sql()` 981312284f0 is described below commit 981312284f0776ca847c8d21411f74a72c639b22 Author: Max Gekk AuthorDate: Tue Sep 19 00:22:43 2023 +0300 [SPARK-45188][SQL][DOCS] Update error messages related to parameterized `sql()` ### What changes were proposed in this pull request? In the PR, I propose to update some error formats and comments regarding `sql()` parameters - maps, arrays and struct might be used as `sql()` parameters. New behaviour has been added by https://github.com/apache/spark/pull/42752. ### Why are the changes needed? To inform users about recent changes introduced by SPARK-45033. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the affected test suite: ``` $ build/sbt "core/testOnly *SparkThrowableSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42957 from MaxGekk/clean-ClientE2ETestSuite. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json | 4 ++-- .../scala/org/apache/spark/sql/SparkSession.scala| 11 +++ docs/sql-error-conditions.md | 4 ++-- python/pyspark/pandas/sql_formatter.py | 3 ++- python/pyspark/sql/session.py| 3 ++- .../spark/sql/catalyst/analysis/parameters.scala | 14 +- .../scala/org/apache/spark/sql/SparkSession.scala| 20 ++-- 7 files changed, 34 insertions(+), 25 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 4740ed72f89..186e7b4640d 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -1892,7 +1892,7 @@ }, "INVALID_SQL_ARG" : { "message" : [ - "The argument of `sql()` is invalid. Consider to replace it by a SQL literal." + "The argument of `sql()` is invalid. Consider to replace it either by a SQL literal or by collection constructor functions such as `map()`, `array()`, `struct()`." ] }, "INVALID_SQL_SYNTAX" : { @@ -2768,7 +2768,7 @@ }, "UNBOUND_SQL_PARAMETER" : { "message" : [ - "Found the unbound parameter: . Please, fix `args` and provide a mapping of the parameter to a SQL literal." + "Found the unbound parameter: . Please, fix `args` and provide a mapping of the parameter to either a SQL literal or collection constructor functions such as `map()`, `array()`, `struct()`." ], "sqlState" : "42P02" }, diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala index 8788e34893e..5aa8c5a2bd5 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -235,8 +235,9 @@ class SparkSession private[sql] ( * An array of Java/Scala objects that can be converted to SQL literal expressions. See https://spark.apache.org/docs/latest/sql-ref-datatypes.html";> Supported Data * Types for supported value types in Scala/Java. For example: 1, "Steven", - * LocalDate.of(2023, 4, 2). A value can be also a `Column` of literal expression, in that - * case it is taken as is. + * LocalDate.of(2023, 4, 2). A value can be also a `Column` of a literal or collection + * constructor functions such as `map()`, `array()`, `struct()`, in that case it is taken as + * is. * * @since 3.5.0 */ @@ -272,7 +273,8 @@ class SparkSession private[sql] ( * expressions. See https://spark.apache.org/docs/latest/sql-ref-datatypes.html";> * Supported Data Types for supported value types in Scala/Java. For example, map keys: * "rank", "name", "birthdate"; map values: 1, "Steven", LocalDate.of(2023, 4, 2). Map value - * can be also a `Column` of literal expression, in that case it is taken as is. + * can be also a `Column` of a literal or collection constructor functions such as `map()`, + * `array()`, `struct()`, in that case it is taken as is. * * @since 3.4.0 */ @@ -292,7 +294,8 @@ class SparkSession private[sql] ( * e
[spark] branch master updated: [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c89221b02bb3 [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()` c89221b02bb3 is described below commit c89221b02bb3000f707a31322e6d40b561e527bd Author: Max Gekk AuthorDate: Wed Sep 20 11:09:01 2023 +0300 [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()` ### What changes were proposed in this pull request? In the PR, I propose to add a few more examples for the `sql()` method in PySpark API with array and map parameters. ### Why are the changes needed? To inform users about recent changes introduced by #42752 and #42470, and check the changes work actually. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new examples: ``` $ python/run-tests --parallelism=1 --testnames 'pyspark.sql.session SparkSession.sql' ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42996 from MaxGekk/map-sql-parameterized-python-connect. Authored-by: Max Gekk Signed-off-by: Max Gekk --- python/pyspark/sql/session.py | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py index dc4f8f321a59..de2e8d0cda2a 100644 --- a/python/pyspark/sql/session.py +++ b/python/pyspark/sql/session.py @@ -1599,23 +1599,27 @@ class SparkSession(SparkConversionMixin): And substitude named parameters with the `:` prefix by SQL literals. ->>> spark.sql("SELECT * FROM {df} WHERE {df[B]} > :minB", {"minB" : 5}, df=mydf).show() -+---+---+ -| A| B| -+---+---+ -| 3| 6| -+---+---+ +>>> from pyspark.sql.functions import create_map +>>> spark.sql( +... "SELECT *, element_at(:m, 'a') AS C FROM {df} WHERE {df[B]} > :minB", +... {"minB" : 5, "m" : create_map(lit('a'), lit(1))}, df=mydf).show() ++---+---+---+ +| A| B| C| ++---+---+---+ +| 3| 6| 1| ++---+---+---+ Or positional parameters marked by `?` in the SQL query by SQL literals. +>>> from pyspark.sql.functions import array >>> spark.sql( -... "SELECT * FROM {df} WHERE {df[B]} > ? and ? < {df[A]}", -... args=[5, 2], df=mydf).show() -+---+---+ -| A| B| -+---+---+ -| 3| 6| -+---+---+ +... "SELECT *, element_at(?, 1) AS C FROM {df} WHERE {df[B]} > ? and ? < {df[A]}", +... args=[array(lit(1), lit(2), lit(3)), 5, 2], df=mydf).show() ++---+---+---+ +| A| B| C| ++---+---+---+ +| 3| 6| 1| ++---+---+---+ """ formatter = SQLStringFormatter(self) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a2bab5efc5b [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()` a2bab5efc5b is described below commit a2bab5efc5b5f0e841e9b34ccbfd2cb99af5923e Author: Max Gekk AuthorDate: Thu Sep 21 09:05:30 2023 +0300 [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()` ### What changes were proposed in this pull request? In the PR, I propose to change the Python connect client to support `Column` as a parameter of `sql()`. ### Why are the changes needed? To achieve feature parity w/ regular PySpark which supports map and arrays as parameters of `sql()`, see https://github.com/apache/spark/pull/42996. ### Does this PR introduce _any_ user-facing change? No. It fixes a bug. ### How was this patch tested? By running the modified tests: ``` $ python/run-tests --parallelism=1 --testnames 'pyspark.sql.tests.connect.test_connect_basic SparkConnectBasicTests.test_sql_with_named_args' $ python/run-tests --parallelism=1 --testnames 'pyspark.sql.tests.connect.test_connect_basic SparkConnectBasicTests.test_sql_with_pos_args' ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43014 from MaxGekk/map-sql-parameterized-python-connect-2. Authored-by: Max Gekk Signed-off-by: Max Gekk --- python/pyspark/sql/connect/plan.py | 22 ++ python/pyspark/sql/connect/session.py | 2 +- .../sql/tests/connect/test_connect_basic.py| 12 3 files changed, 19 insertions(+), 17 deletions(-) diff --git a/python/pyspark/sql/connect/plan.py b/python/pyspark/sql/connect/plan.py index 3e8db2aae09..d069081e1af 100644 --- a/python/pyspark/sql/connect/plan.py +++ b/python/pyspark/sql/connect/plan.py @@ -1049,6 +1049,12 @@ class SQL(LogicalPlan): self._query = query self._args = args +def _to_expr(self, session: "SparkConnectClient", v: Any) -> proto.Expression: +if isinstance(v, Column): +return v.to_plan(session) +else: +return LiteralExpression._from_value(v).to_plan(session) + def plan(self, session: "SparkConnectClient") -> proto.Relation: plan = self._create_proto_relation() plan.sql.query = self._query @@ -1056,14 +1062,10 @@ class SQL(LogicalPlan): if self._args is not None and len(self._args) > 0: if isinstance(self._args, Dict): for k, v in self._args.items(): -plan.sql.args[k].CopyFrom( - LiteralExpression._from_value(v).to_plan(session).literal -) + plan.sql.named_arguments[k].CopyFrom(self._to_expr(session, v)) else: for v in self._args: -plan.sql.pos_args.append( - LiteralExpression._from_value(v).to_plan(session).literal -) +plan.sql.pos_arguments.append(self._to_expr(session, v)) return plan @@ -1073,14 +1075,10 @@ class SQL(LogicalPlan): if self._args is not None and len(self._args) > 0: if isinstance(self._args, Dict): for k, v in self._args.items(): -cmd.sql_command.args[k].CopyFrom( - LiteralExpression._from_value(v).to_plan(session).literal -) + cmd.sql_command.named_arguments[k].CopyFrom(self._to_expr(session, v)) else: for v in self._args: -cmd.sql_command.pos_args.append( - LiteralExpression._from_value(v).to_plan(session).literal -) + cmd.sql_command.pos_arguments.append(self._to_expr(session, v)) return cmd diff --git a/python/pyspark/sql/connect/session.py b/python/pyspark/sql/connect/session.py index 7582fe86ff2..e5d1d95a699 100644 --- a/python/pyspark/sql/connect/session.py +++ b/python/pyspark/sql/connect/session.py @@ -557,7 +557,7 @@ class SparkSession: if "sql_command_result" in properties: return DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self) else: -return DataFrame.withPlan(SQL(sqlQuery, args), self) +return DataFrame.withPlan(cmd, self) sql.__doc__ = PySparkSession.sql.__doc__ diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py b/python/pyspark/sql/tests/connect/test_connect_basic.py index 2b979570618..c5a127136d6 100644 --- a/python/pyspark/sql/te
[spark] branch master updated: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8b967e191b7 [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018 8b967e191b7 is described below commit 8b967e191b755d7f2830c15d382c83ce7aeb69c1 Author: dengziming AuthorDate: Thu Sep 21 10:22:37 2023 +0300 [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018 ### What changes were proposed in this pull request? Assign the name `CLASS_UNSUPPORTED_BY_MAP_OBJECTS` to the legacy error class `_LEGACY_ERROR_TEMP_2018`. ### Why are the changes needed? To assign proper name as a part of activity in SPARK-37935 ### Does this PR introduce _any_ user-facing change? Yes, the error message will include the error class name ### How was this patch tested? Add a unit test to produce the error from user code. ### Was this patch authored or co-authored using generative AI tooling? No Closes #42939 from dengziming/SPARK-43254. Authored-by: dengziming Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 10 +++--- docs/sql-error-conditions.md | 6 .../sql/catalyst/encoders/ExpressionEncoder.scala | 2 +- .../spark/sql/errors/QueryExecutionErrors.scala| 2 +- .../expressions/ObjectExpressionsSuite.scala | 11 +++--- .../scala/org/apache/spark/sql/DatasetSuite.scala | 40 -- 6 files changed, 57 insertions(+), 14 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index d92ccfce5c5..8942d3755e9 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -344,6 +344,11 @@ ], "sqlState" : "22003" }, + "CLASS_UNSUPPORTED_BY_MAP_OBJECTS" : { +"message" : [ + "`MapObjects` does not support the class as resulting collection." +] + }, "CODEC_NOT_AVAILABLE" : { "message" : [ "The codec is not available. Consider to set the config to ." @@ -4944,11 +4949,6 @@ "not resolved." ] }, - "_LEGACY_ERROR_TEMP_2018" : { -"message" : [ - "class `` is not supported by `MapObjects` as resulting collection." -] - }, "_LEGACY_ERROR_TEMP_2020" : { "message" : [ "Couldn't find a valid constructor on ." diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 1df00f72bc9..f6f94efc2b0 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -297,6 +297,12 @@ The value `` of the type `` cannot be cast to `` Fail to assign a value of `` type to the `` type column or variable `` due to an overflow. Use `try_cast` on the input value to tolerate overflow and return NULL instead. +### CLASS_UNSUPPORTED_BY_MAP_OBJECTS + +SQLSTATE: none assigned + +`MapObjects` does not support the class `` as resulting collection. + ### CODEC_NOT_AVAILABLE SQLSTATE: none assigned diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala index ff72b5a0d96..74d7a5e7a67 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala @@ -170,7 +170,7 @@ object ExpressionEncoder { * Function that deserializes an [[InternalRow]] into an object of type `T`. This class is not * thread-safe. */ - class Deserializer[T](private val expressions: Seq[Expression]) + class Deserializer[T](val expressions: Seq[Expression]) extends (InternalRow => T) with Serializable { @transient private[this] var constructProjection: Projection = _ diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index e14fef1fad7..84472490128 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -422,7 +422,7 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE def classUnsupportedByMapObjectsError(cls: Class[_]): SparkRuntimeException = { new SparkRuntimeException( - errorClass = "_LEGACY_ERROR_TEMP_2018", + errorClass = "CLASS_UNSUPPORTED_BY_MAP_OBJECTS&qu
[spark] branch master updated: [SPARK-45316][CORE][SQL] Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 60d02b444e2 [SPARK-45316][CORE][SQL] Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD` 60d02b444e2 is described below commit 60d02b444e2225b3afbe4955dabbea505e9f769c Author: Max Gekk AuthorDate: Tue Sep 26 17:33:07 2023 +0300 [SPARK-45316][CORE][SQL] Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD` ### What changes were proposed in this pull request? In the PR, I propose to add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`, and set it to the current value of: - `spark.files.ignoreCorruptFiles`/`ignoreMissingFiles` in Spark `core`, - `spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles` when the rdds created in Spark SQL. ### Why are the changes needed? 1. To make `HadoopRDD` and `NewHadoopRDD` consistent to other RDDs like `FileScanRDD` created by Spark SQL that take into account the SQL configs `spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles`. 2. To improve user experience with Spark SQL, so, users can control ignoring of missing files without re-creating spark context. ### Does this PR introduce _any_ user-facing change? Yes, `HadoopRDD`/`NewHadoopRDD` invoked by SQL code such hive table scans respect the SQL configs `spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles` and don't respect the core configs `spark.files.ignoreCorruptFiles`/`ignoreMissingFiles`. ### How was this patch tested? By running the affected tests: ``` $ build/sbt "test:testOnly *QueryPartitionSuite" $ build/sbt "test:testOnly *FileSuite" $ build/sbt "test:testOnly *FileBasedDataSourceSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43097 from MaxGekk/dynamic-ignoreMissingFiles. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../scala/org/apache/spark/rdd/HadoopRDD.scala | 31 ++ .../scala/org/apache/spark/rdd/NewHadoopRDD.scala | 27 +++ docs/sql-migration-guide.md| 1 + .../org/apache/spark/sql/hive/TableReader.scala| 9 --- .../spark/sql/hive/QueryPartitionSuite.scala | 6 ++--- 5 files changed, 58 insertions(+), 16 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala index cad107256c5..0b5f6a3d716 100644 --- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala @@ -89,6 +89,8 @@ private[spark] class HadoopPartition(rddId: Int, override val index: Int, s: Inp * @param keyClass Class of the key associated with the inputFormatClass. * @param valueClass Class of the value associated with the inputFormatClass. * @param minPartitions Minimum number of HadoopRDD partitions (Hadoop Splits) to generate. + * @param ignoreCorruptFiles Whether to ignore corrupt files. + * @param ignoreMissingFiles Whether to ignore missing files. * * @note Instantiating this class directly is not recommended, please use * `org.apache.spark.SparkContext.hadoopRDD()` @@ -101,13 +103,36 @@ class HadoopRDD[K, V]( inputFormatClass: Class[_ <: InputFormat[K, V]], keyClass: Class[K], valueClass: Class[V], -minPartitions: Int) +minPartitions: Int, +ignoreCorruptFiles: Boolean, +ignoreMissingFiles: Boolean) extends RDD[(K, V)](sc, Nil) with Logging { if (initLocalJobConfFuncOpt.isDefined) { sparkContext.clean(initLocalJobConfFuncOpt.get) } + def this( + sc: SparkContext, + broadcastedConf: Broadcast[SerializableConfiguration], + initLocalJobConfFuncOpt: Option[JobConf => Unit], + inputFormatClass: Class[_ <: InputFormat[K, V]], + keyClass: Class[K], + valueClass: Class[V], + minPartitions: Int) = { +this( + sc, + broadcastedConf, + initLocalJobConfFuncOpt, + inputFormatClass, + keyClass, + valueClass, + minPartitions, + ignoreCorruptFiles = sc.conf.get(IGNORE_CORRUPT_FILES), + ignoreMissingFiles = sc.conf.get(IGNORE_MISSING_FILES) +) + } + def this( sc: SparkContext, conf: JobConf, @@ -135,10 +160,6 @@ class HadoopRDD[K, V]( private val shouldCloneJobConf = sparkContext.conf.getBoolean("spark.hadoop.cloneConf", false) - private val ignoreCorruptFiles = sparkContext.conf.get(IGNORE_CORRUPT_FILES) - - private val ignoreMissingFiles = sparkContext.conf.get(IGNORE_MISSING_FILES) - private val ignoreEmptySplits =
[spark] branch master updated: [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new eff46ea77e9 [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath` eff46ea77e9 is described below commit eff46ea77e9bebef3076277bef1e086833dd Author: Max Gekk AuthorDate: Wed Sep 27 08:28:45 2023 +0300 [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath` ### What changes were proposed in this pull request? In the PR, I propose to remove already deprecated SQL config `spark.sql.hive.verifyPartitionPath`, and the code under the config. The config has been deprecated since Spark 3.0. ### Why are the changes needed? To improve code maintainability by remove unused code. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the modified test suite: ``` $ build/sbt "test:testOnly *SQLConfSuite" $ build/sbt "test:testOnly *QueryPartitionSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43130 from MaxGekk/remove-verifyPartitionPath. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../org/apache/spark/sql/internal/SQLConf.scala| 17 ++--- .../apache/spark/sql/internal/SQLConfSuite.scala | 4 +-- .../org/apache/spark/sql/hive/TableReader.scala| 41 +- .../spark/sql/hive/QueryPartitionSuite.scala | 12 ++- 4 files changed, 8 insertions(+), 66 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 43eb0756d8d..aeef531dbcd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -34,7 +34,6 @@ import org.apache.hadoop.fs.Path import org.apache.spark.{ErrorMessageFormat, SparkConf, SparkContext, TaskContext} import org.apache.spark.internal.Logging import org.apache.spark.internal.config._ -import org.apache.spark.internal.config.{IGNORE_MISSING_FILES => SPARK_IGNORE_MISSING_FILES} import org.apache.spark.network.util.ByteUnit import org.apache.spark.sql.catalyst.ScalaReflection import org.apache.spark.sql.catalyst.analysis.{HintErrorLogger, Resolver} @@ -1261,14 +1260,6 @@ object SQLConf { .booleanConf .createWithDefault(false) - val HIVE_VERIFY_PARTITION_PATH = buildConf("spark.sql.hive.verifyPartitionPath") -.doc("When true, check all the partition paths under the table\'s root directory " + - "when reading data stored in HDFS. This configuration will be deprecated in the future " + - s"releases and replaced by ${SPARK_IGNORE_MISSING_FILES.key}.") -.version("1.4.0") -.booleanConf -.createWithDefault(false) - val HIVE_METASTORE_DROP_PARTITION_BY_NAME = buildConf("spark.sql.hive.dropPartitionByName.enabled") .doc("When true, Spark will get partition name rather than partition object " + @@ -4472,8 +4463,6 @@ object SQLConf { PANDAS_GROUPED_MAP_ASSIGN_COLUMNS_BY_NAME.key, "2.4", "The config allows to switch to the behaviour before Spark 2.4 " + "and will be removed in the future releases."), - DeprecatedConfig(HIVE_VERIFY_PARTITION_PATH.key, "3.0", -s"This config is replaced by '${SPARK_IGNORE_MISSING_FILES.key}'."), DeprecatedConfig(ARROW_EXECUTION_ENABLED.key, "3.0", s"Use '${ARROW_PYSPARK_EXECUTION_ENABLED.key}' instead of it."), DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0", @@ -4552,7 +4541,9 @@ object SQLConf { RemovedConfig("spark.sql.ansi.strictIndexOperator", "3.4.0", "true", "This was an internal configuration. It is not needed anymore since Spark SQL always " + "returns null when getting a map value with a non-existing key. See SPARK-40066 " + - "for more details.") + "for more details."), + RemovedConfig("spark.sql.hive.verifyPartitionPath", "4.0.0", "false", +s"This config was replaced by '${IGNORE_MISSING_FILES.key}'.") ) Map(configs.map { cfg => cfg.key -> cfg } : _*) @@ -4766,8 +4757,6 @@ class SQLConf extends Serializable with Logging with SqlApiConf { def isOrcSchemaMergingEnabled: Boolean = getConf(ORC_SCHEMA_MERGING_ENABLED) - def verifyPartitionPath: Boolean = getConf(HIVE_VERIFY_PAR
[spark] branch master updated: [MINOR][SQL] Remove duplicate cases of escaping characters in string literals
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ee21b12c395 [MINOR][SQL] Remove duplicate cases of escaping characters in string literals ee21b12c395 is described below commit ee21b12c395ac184c8ddc2f74b66f6e6285de5fa Author: Max Gekk AuthorDate: Thu Sep 28 21:18:40 2023 +0300 [MINOR][SQL] Remove duplicate cases of escaping characters in string literals ### What changes were proposed in this pull request? In the PR, I propose to remove some cases in `appendEscapedChar()` because they fall to the default case. The following tests check the cases: - https://github.com/apache/spark/blob/187e9a851758c0e9cec11edab2bc07d6f4404001/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala#L97-L98 - https://github.com/apache/spark/blob/187e9a851758c0e9cec11edab2bc07d6f4404001/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala#L104 ### Why are the changes needed? To improve code maintainability. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the affected test suite: ``` $ build/sbt "test:testOnly *.ParserUtilsSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43170 from MaxGekk/cleanup-escaping. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala| 3 --- 1 file changed, 3 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala index c318f208255..a4ce5fb1203 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala @@ -38,14 +38,11 @@ trait SparkParserUtils { def appendEscapedChar(n: Char): Unit = { n match { case '0' => sb.append('\u') -case '\'' => sb.append('\'') -case '"' => sb.append('\"') case 'b' => sb.append('\b') case 'n' => sb.append('\n') case 'r' => sb.append('\r') case 't' => sb.append('\t') case 'Z' => sb.append('\u001A') -case '\\' => sb.append('\\') // The following 2 lines are exactly what MySQL does TODO: why do we do this? case '%' => sb.append("\\%") case '_' => sb.append("\\_") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45398][SQL] Append `ESCAPE` in `sql()` of the `Like` expression
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cc4ecb5104e [SPARK-45398][SQL] Append `ESCAPE` in `sql()` of the `Like` expression cc4ecb5104e is described below commit cc4ecb5104e37d5e530d44b41fc1d8f8116e37d8 Author: Max Gekk AuthorDate: Wed Oct 4 11:35:05 2023 +0300 [SPARK-45398][SQL] Append `ESCAPE` in `sql()` of the `Like` expression ### What changes were proposed in this pull request? In the PR, I propose to fix the `sql()` method of the `Like` expression, and append the `ESCAPE` clause when the `escapeChar` is not the default one `\\`. ### Why are the changes needed? 1. To be consistent to the `toString()` method 2. To distinguish column names when the escape argument is set. Before the changes, columns might conflict like the example below, and that could confuse users: ```sql spark-sql (default)> create temp view tbl as (SELECT 'a|_' like 'a||_' escape '|', 'a|_' like 'a||_' escape 'a'); [COLUMN_ALREADY_EXISTS] The column `a|_ like a||_` already exists. Consider to choose another name or rename the existing column. ``` ### Does this PR introduce _any_ user-facing change? Should not. ### How was this patch tested? Manually checking the column name by: ```sql spark-sql (default)> create temp view tbl as (SELECT 'a|_' like 'a||_' escape '|', 'a|_' like 'a||_' escape 'a'); Time taken: 0.531 seconds spark-sql (default)> describe extended tbl; a|_ LIKE a||_ ESCAPE '|'boolean a|_ LIKE a||_ ESCAPE 'a'boolean ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43196 from MaxGekk/fix-like-sql. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../explain-results/function_like_with_escape.explain | 2 +- .../spark/sql/catalyst/expressions/regexpExpressions.scala| 11 +++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_like_with_escape.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_like_with_escape.explain index 471a3a4bd52..1a15a27d97e 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_like_with_escape.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_like_with_escape.explain @@ -1,2 +1,2 @@ -Project [g#0 LIKE g#0 ESCAPE '/' AS g LIKE g#0] +Project [g#0 LIKE g#0 ESCAPE '/' AS g LIKE g ESCAPE '/'#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala index 5ebfdd919b8..69d90296d7f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala @@ -133,12 +133,15 @@ case class Like(left: Expression, right: Expression, escapeChar: Char) final override val nodePatterns: Seq[TreePattern] = Seq(LIKE_FAMLIY) - override def toString: String = escapeChar match { -case '\\' => s"$left LIKE $right" -case c => s"$left LIKE $right ESCAPE '$c'" + override def toString: String = { +val escapeSuffix = if (escapeChar == '\\') "" else s" ESCAPE '$escapeChar'" +s"$left ${prettyName.toUpperCase(Locale.ROOT)} $right" + escapeSuffix } - override def sql: String = s"${left.sql} ${prettyName.toUpperCase(Locale.ROOT)} ${right.sql}" + override def sql: String = { +val escapeSuffix = if (escapeChar == '\\') "" else s" ESCAPE ${Literal(escapeChar).sql}" +s"${left.sql} ${prettyName.toUpperCase(Locale.ROOT)} ${right.sql}" + escapeSuffix + } override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val patternClass = classOf[Pattern].getName - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c0d9ca3be14c [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions c0d9ca3be14c is described below commit c0d9ca3be14cb0ec8d8f9920d3ecc4aac3cf5adc Author: Max Gekk AuthorDate: Thu Oct 5 22:22:29 2023 +0300 [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions ### What changes were proposed in this pull request? In the PR, I propose to refer to the unescaping rules added by https://github.com/apache/spark/pull/43152 from expression descriptions like in `Like`, see https://github.com/apache/spark/assets/1580697/6a332b50-f2c8-4549-848a-61519c9f964e";> ### Why are the changes needed? To improve user experience w/ Spark SQL. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually generated docs and checked by eyes. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43203 from MaxGekk/link-to-escape-doc. Authored-by: Max Gekk Signed-off-by: Max Gekk --- docs/sql-ref-literals.md | 2 + .../catalyst/expressions/regexpExpressions.scala | 70 ++ 2 files changed, 47 insertions(+), 25 deletions(-) diff --git a/docs/sql-ref-literals.md b/docs/sql-ref-literals.md index e9447af71c54..2a02a22bd6f0 100644 --- a/docs/sql-ref-literals.md +++ b/docs/sql-ref-literals.md @@ -62,6 +62,8 @@ The following escape sequences are recognized in regular string literals (withou - `\_` -> `\_`; - `\` -> ``, skip the slash and leave the character as is. +The unescaping rules above can be turned off by setting the SQL config `spark.sql.parser.escapedStringLiterals` to `true`. + Examples ```sql diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala index 69d90296d7ff..87ea8b5a102a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala @@ -77,7 +77,7 @@ abstract class StringRegexExpression extends BinaryExpression } } -// scalastyle:off line.contains.tab +// scalastyle:off line.contains.tab line.size.limit /** * Simple RegEx pattern matching function */ @@ -92,11 +92,14 @@ abstract class StringRegexExpression extends BinaryExpression _ matches any one character in the input (similar to . in posix regular expressions)\ % matches zero or more characters in the input (similar to .* in posix regular expressions) - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order - to match "\abc", the pattern should be "\\abc". + Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping + rules at https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal";>String Literal. + For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back to Spark 1.6 behavior regarding string literal parsing. For example, if the config is - enabled, the pattern to match "\abc" should be "\abc". + enabled, the pattern to match "\abc" should be "\abc". + It's recommended to use a raw string literal (with the `r` prefix) to avoid escaping + special characters in the pattern string if exists. * escape - an character added since Spark 3.0. The default escape character is the '\'. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character. @@ -121,7 +124,7 @@ abstract class StringRegexExpression extends BinaryExpression """, since = "1.0.0", group = "predicate_funcs") -// scalastyle:on line.contains.tab +// scalastyle:on line.contains.tab line.size.limit case class Like(left: Expression, right: Expression, escapeChar: Char) extends StringRegexExpression { @@ -207,11 +210,14 @@ case class Like(left: Expression, right: Expression, escapeChar: Char) _ matches any one character in the input (similar to . in posix regular expressions) % matches zero or more characters in the
[spark] branch master updated: [SPARK-45262][SQL][TESTS][DOCS] Improve examples for regexp parameters
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e3b1bb117fe9 [SPARK-45262][SQL][TESTS][DOCS] Improve examples for regexp parameters e3b1bb117fe9 is described below commit e3b1bb117fe9bf0b17321e6359b7aa90f70a24b5 Author: Max Gekk AuthorDate: Fri Oct 6 22:34:40 2023 +0300 [SPARK-45262][SQL][TESTS][DOCS] Improve examples for regexp parameters ### What changes were proposed in this pull request? In the PR, I propose to add a few more examples for `LIKE`, `ILIKE`, `RLIKE`, `regexp_instr()`, `regexp_extract_all()` that highlight correctness of current description and test a couple more of corner cases. ### Why are the changes needed? The description of `LIKE` says: ``` ... in order to match "\abc", the pattern should be "\\abc" ``` but in Spark SQL shell: ```sql spark-sql (default)> SELECT c FROM t; \abc spark-sql (default)> SELECT c LIKE "\\abc" FROM t; [INVALID_FORMAT.ESC_IN_THE_MIDDLE] The format is invalid: '\\abc'. The escape character is not allowed to precede 'a'. spark-sql (default)> SELECT c LIKE "abc" FROM t; true ``` So, the description might confuse users since the pattern must contain 4 slashes when the pattern is a regular SQL string. New example shows that the pattern "\\abc" is correct if we take into account the string as a raw string: ```sql spark-sql (default)> SELECT c LIKE R"\\abc" FROM t; true ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new and modified tests: ``` $ build/sbt "test:testOnly *.StringFunctionsSuite" $ build/sbt "sql/test:testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43037 from MaxGekk/fix-like-doc. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../sql/catalyst/expressions/regexpExpressions.scala | 18 -- .../resources/sql-functions/sql-expression-schema.md | 2 +- .../org/apache/spark/sql/StringFunctionsSuite.scala| 5 + 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala index 87ea8b5a102a..b33de303b5d5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala @@ -108,13 +108,15 @@ abstract class StringRegexExpression extends BinaryExpression Examples: > SELECT _FUNC_('Spark', '_park'); true + > SELECT '\\abc' AS S, S _FUNC_ r'\\abc', S _FUNC_ 'abc'; + \abc truetrue > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%'; true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false - > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%'; + > SELECT '%SystemDrive%\\Users\\John' _FUNC_ r'%SystemDrive%\\Users%'; true > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/'; true @@ -226,13 +228,15 @@ case class Like(left: Expression, right: Expression, escapeChar: Char) Examples: > SELECT _FUNC_('Spark', '_Park'); true + > SELECT '\\abc' AS S, S _FUNC_ r'\\abc', S _FUNC_ 'abc'; + \abc truetrue > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\users%'; true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false - > SELECT '%SystemDrive%\\USERS\\John' _FUNC_ '\%SystemDrive\%Users%'; + > SELECT '%SystemDrive%\\USERS\\John' _FUNC_ r'%SystemDrive%\\Users%'; true > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SYSTEMDrive/%//Users%' ESCAPE '/'
[spark] branch master updated: [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4493b431192 [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match 4493b431192 is described below commit 4493b431192fcdbab1379b7ffb89eea0cdaa19f1 Author: Jia Fan AuthorDate: Mon Oct 9 12:30:20 2023 +0300 [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match ### What changes were proposed in this pull request? When use custom pattern to parse timestamp, if there have matched prefix, not matched all. The `Iso8601TimestampFormatter::parseOptional` and `Iso8601TimestampFormatter::parseWithoutTimeZoneOptional` should not return not empty result. eg: pattern = `-MM-dd HH:mm:ss`, value = `-12-31 23:59:59.999`. If fact, `-MM-dd HH:mm:ss` can parse `-12-31 23:59:59` normally, but value have suffix `.999`. so we can't return not empty result. This bug will affect inference the schema in CSV/JSON. ### Why are the changes needed? Fix inference the schema bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43245 from Hisoka-X/SPARK-45424-inference-schema-unresolved. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/util/TimestampFormatter.scala| 10 ++ .../spark/sql/catalyst/util/TimestampFormatterSuite.scala | 10 ++ 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala index 8a288d0e9f3..55eee41c14c 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala @@ -167,8 +167,9 @@ class Iso8601TimestampFormatter( override def parseOptional(s: String): Option[Long] = { try { - val parsed = formatter.parseUnresolved(s, new ParsePosition(0)) - if (parsed != null) { + val parsePosition = new ParsePosition(0) + val parsed = formatter.parseUnresolved(s, parsePosition) + if (parsed != null && s.length == parsePosition.getIndex) { Some(extractMicros(parsed)) } else { None @@ -196,8 +197,9 @@ class Iso8601TimestampFormatter( override def parseWithoutTimeZoneOptional(s: String, allowTimeZone: Boolean): Option[Long] = { try { - val parsed = formatter.parseUnresolved(s, new ParsePosition(0)) - if (parsed != null) { + val parsePosition = new ParsePosition(0) + val parsed = formatter.parseUnresolved(s, parsePosition) + if (parsed != null && s.length == parsePosition.getIndex) { Some(extractMicrosNTZ(s, parsed, allowTimeZone)) } else { None diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala index ecd849dd3af..d2fc89a034f 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala @@ -491,4 +491,14 @@ class TimestampFormatterSuite extends DatetimeFormatterSuite { assert(simpleFormatter.parseOptional("abc").isEmpty) } + + test("SPARK-45424: do not return optional parse results when only prefix match") { +val formatter = new Iso8601TimestampFormatter( + "-MM-dd HH:mm:ss", + locale = DateFormatter.defaultLocale, + legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT, + isParsing = true, zoneId = DateTimeTestUtils.LA) +assert(formatter.parseOptional("-12-31 23:59:59.999").isEmpty) +assert(formatter.parseWithoutTimeZoneOptional("-12-31 23:59:59.999", true).isEmpty) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 5f8ae9a3dbd [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match 5f8ae9a3dbd is described below commit 5f8ae9a3dbd2c7624bffd588483c9916c302c081 Author: Jia Fan AuthorDate: Mon Oct 9 12:30:20 2023 +0300 [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match ### What changes were proposed in this pull request? When use custom pattern to parse timestamp, if there have matched prefix, not matched all. The `Iso8601TimestampFormatter::parseOptional` and `Iso8601TimestampFormatter::parseWithoutTimeZoneOptional` should not return not empty result. eg: pattern = `-MM-dd HH:mm:ss`, value = `-12-31 23:59:59.999`. If fact, `-MM-dd HH:mm:ss` can parse `-12-31 23:59:59` normally, but value have suffix `.999`. so we can't return not empty result. This bug will affect inference the schema in CSV/JSON. ### Why are the changes needed? Fix inference the schema bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43245 from Hisoka-X/SPARK-45424-inference-schema-unresolved. Authored-by: Jia Fan Signed-off-by: Max Gekk (cherry picked from commit 4493b431192fcdbab1379b7ffb89eea0cdaa19f1) Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/util/TimestampFormatter.scala| 10 ++ .../spark/sql/catalyst/util/TimestampFormatterSuite.scala | 10 ++ 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala index 8a288d0e9f3..55eee41c14c 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala @@ -167,8 +167,9 @@ class Iso8601TimestampFormatter( override def parseOptional(s: String): Option[Long] = { try { - val parsed = formatter.parseUnresolved(s, new ParsePosition(0)) - if (parsed != null) { + val parsePosition = new ParsePosition(0) + val parsed = formatter.parseUnresolved(s, parsePosition) + if (parsed != null && s.length == parsePosition.getIndex) { Some(extractMicros(parsed)) } else { None @@ -196,8 +197,9 @@ class Iso8601TimestampFormatter( override def parseWithoutTimeZoneOptional(s: String, allowTimeZone: Boolean): Option[Long] = { try { - val parsed = formatter.parseUnresolved(s, new ParsePosition(0)) - if (parsed != null) { + val parsePosition = new ParsePosition(0) + val parsed = formatter.parseUnresolved(s, parsePosition) + if (parsed != null && s.length == parsePosition.getIndex) { Some(extractMicrosNTZ(s, parsed, allowTimeZone)) } else { None diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala index eb173bc7f8c..2134a0d6ecd 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala @@ -507,4 +507,14 @@ class TimestampFormatterSuite extends DatetimeFormatterSuite { assert(simpleFormatter.parseOptional("abc").isEmpty) } + + test("SPARK-45424: do not return optional parse results when only prefix match") { +val formatter = new Iso8601TimestampFormatter( + "-MM-dd HH:mm:ss", + locale = DateFormatter.defaultLocale, + legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT, + isParsing = true, zoneId = DateTimeTestUtils.LA) +assert(formatter.parseOptional("-12-31 23:59:59.999").isEmpty) +assert(formatter.parseWithoutTimeZoneOptional("-12-31 23:59:59.999", true).isEmpty) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4493b431192 -> af800b50595)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4493b431192 [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match add af800b50595 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 4841a404be3 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file 4841a404be3 is described below commit 4841a404be3c37fc16031a0119b321eefcb2faab Author: panbingkun AuthorDate: Mon Oct 9 12:32:14 2023 +0300 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file ### What changes were proposed in this pull request? The pr aims to remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file. ### Why are the changes needed? - When I am work on another PR, I use the following command: ``` SPARK_GENERATE_GOLDEN_FILES=1 build/sbt \ "core/testOnly *SparkThrowableSuite -- -t \"Error classes match with document\"" ``` I found that in the automatically generated `sql-error-conditions.md` file, there are 2 extra spaces added at the end, Obviously, this is not what we expected, otherwise we would need to manually remove it, which is not in line with automation. - The git tells us this difference, as follows: https://github.com/apache/spark/assets/15246973/a68b657f-3a00-4405-9623-1f7ab9d44d82";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43274 from panbingkun/SPARK-45459. Authored-by: panbingkun Signed-off-by: Max Gekk (cherry picked from commit af800b505956ff26e03c5fc56b6cb4ac5c0efe2f) Signed-off-by: Max Gekk --- core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala index 0249cde5488..299bcea3f9e 100644 --- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala @@ -253,8 +253,7 @@ class SparkThrowableSuite extends SparkFunSuite { | |Also see [SQLSTATE Codes](sql-error-conditions-sqlstates.html). | - |$sqlErrorParentDocContent - |""".stripMargin + |$sqlErrorParentDocContent""".stripMargin errors.filter(_._2.subClass.isDefined).foreach(error => { val name = error._1 @@ -316,7 +315,7 @@ class SparkThrowableSuite extends SparkFunSuite { } FileUtils.writeStringToFile( parentDocPath.toFile, - sqlErrorParentDoc + lineSeparator, + sqlErrorParentDoc, StandardCharsets.UTF_8) } } else { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45383][SQL] Fix error message for time travel with non-existing table
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ced321c8b5a [SPARK-45383][SQL] Fix error message for time travel with non-existing table ced321c8b5a is described below commit ced321c8b5a32c69dfb2841d4bec8a03f21b8038 Author: Wenchen Fan AuthorDate: Mon Oct 9 22:15:45 2023 +0300 [SPARK-45383][SQL] Fix error message for time travel with non-existing table ### What changes were proposed in this pull request? Fixes a small bug to report `TABLE_OR_VIEW_NOT_FOUND` error correctly for time travel. It was missed before because `RelationTimeTravel` is a leaf node but it may contain `UnresolvedRelation`. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? Yes, the error message becomes reasonable ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #43298 from cloud-fan/time-travel. Authored-by: Wenchen Fan Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala| 4 .../org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 11 +++ 2 files changed, 15 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index e140625f47a..611dd7b3009 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -384,6 +384,9 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB }) operator match { + case RelationTimeTravel(u: UnresolvedRelation, _, _) => +u.tableNotFound(u.multipartIdentifier) + case etw: EventTimeWatermark => etw.eventTime.dataType match { case s: StructType @@ -396,6 +399,7 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB "eventName" -> toSQLId(etw.eventTime.name), "eventType" -> toSQLType(etw.eventTime.dataType))) } + case f: Filter if f.condition.dataType != BooleanType => f.failAnalysis( errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN", diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala index ae639b272a2..047bc8de739 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala @@ -3014,6 +3014,17 @@ class DataSourceV2SQLSuiteV1Filter sqlState = None, parameters = Map("relationId" -> "`x`")) + checkError( +exception = intercept[AnalysisException] { + sql("SELECT * FROM non_exist VERSION AS OF 1") +}, +errorClass = "TABLE_OR_VIEW_NOT_FOUND", +parameters = Map("relationName" -> "`non_exist`"), +context = ExpectedContext( + fragment = "non_exist", + start = 14, + stop = 22)) + val subquery1 = "SELECT 1 FROM non_exist" checkError( exception = intercept[AnalysisException] { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45383][SQL] Fix error message for time travel with non-existing table
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 8bf5a5bca3f [SPARK-45383][SQL] Fix error message for time travel with non-existing table 8bf5a5bca3f is described below commit 8bf5a5bca3f9f7db78182d14e56476d384f442fa Author: Wenchen Fan AuthorDate: Mon Oct 9 22:15:45 2023 +0300 [SPARK-45383][SQL] Fix error message for time travel with non-existing table ### What changes were proposed in this pull request? Fixes a small bug to report `TABLE_OR_VIEW_NOT_FOUND` error correctly for time travel. It was missed before because `RelationTimeTravel` is a leaf node but it may contain `UnresolvedRelation`. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? Yes, the error message becomes reasonable ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #43298 from cloud-fan/time-travel. Authored-by: Wenchen Fan Signed-off-by: Max Gekk (cherry picked from commit ced321c8b5a32c69dfb2841d4bec8a03f21b8038) Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala| 4 .../org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 11 +++ 2 files changed, 15 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 511f3622e7e..533ea8a2b79 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -365,6 +365,9 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB }) operator match { + case RelationTimeTravel(u: UnresolvedRelation, _, _) => +u.tableNotFound(u.multipartIdentifier) + case etw: EventTimeWatermark => etw.eventTime.dataType match { case s: StructType @@ -377,6 +380,7 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB "eventName" -> toSQLId(etw.eventTime.name), "eventType" -> toSQLType(etw.eventTime.dataType))) } + case f: Filter if f.condition.dataType != BooleanType => f.failAnalysis( errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN", diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala index 06f5600e0d1..7745e9c0a4e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala @@ -3014,6 +3014,17 @@ class DataSourceV2SQLSuiteV1Filter sqlState = None, parameters = Map("relationId" -> "`x`")) + checkError( +exception = intercept[AnalysisException] { + sql("SELECT * FROM non_exist VERSION AS OF 1") +}, +errorClass = "TABLE_OR_VIEW_NOT_FOUND", +parameters = Map("relationName" -> "`non_exist`"), +context = ExpectedContext( + fragment = "non_exist", + start = 14, + stop = 22)) + val subquery1 = "SELECT 1 FROM non_exist" checkError( exception = intercept[AnalysisException] { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f378b506bf1 -> 76230765674)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f378b506bf1 [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind add 76230765674 [SPARK-45458][SQL] Convert IllegalArgumentException to SparkIllegalArgumentException in bitwiseExpressions No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-classes.json| 5 +++ ...nditions-invalid-parameter-value-error-class.md | 4 +++ .../catalyst/expressions/bitwiseExpressions.scala | 15 .../spark/sql/errors/QueryExecutionErrors.scala| 10 ++ .../expressions/BitwiseExpressionsSuite.scala | 42 -- .../resources/sql-tests/results/bitwise.sql.out| 26 +++--- 6 files changed, 79 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45213][SQL] Assign name to the error _LEGACY_ERROR_TEMP_2151
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6373f19f537 [SPARK-45213][SQL] Assign name to the error _LEGACY_ERROR_TEMP_2151 6373f19f537 is described below commit 6373f19f537f69c6460b2e4097f19903c01a608f Author: dengziming AuthorDate: Tue Oct 10 15:36:18 2023 +0300 [SPARK-45213][SQL] Assign name to the error _LEGACY_ERROR_TEMP_2151 ### What changes were proposed in this pull request? Assign the name `EXPRESSION_DECODING_FAILED` to the legacy error class `_LEGACY_ERROR_TEMP_2151`. ### Why are the changes needed? To assign proper name as a part of activity in SPARK-37935. ### Does this PR introduce _any_ user-facing change? Yes, the error message will include the error class name ### How was this patch tested? An existing unit test to produce the error from user code. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43029 from dengziming/SPARK-45213. Authored-by: dengziming Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/error-classes.json | 11 +-- docs/sql-error-conditions.md | 6 ++ .../org/apache/spark/sql/errors/QueryExecutionErrors.scala| 3 +-- .../spark/sql/catalyst/encoders/EncoderResolutionSuite.scala | 2 +- .../src/test/scala/org/apache/spark/sql/DatasetSuite.scala| 5 ++--- 5 files changed, 15 insertions(+), 12 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 690d1ae1a14..1239793b3f9 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -921,6 +921,11 @@ } } }, + "EXPRESSION_DECODING_FAILED" : { +"message" : [ + "Failed to decode a row to a value of the expressions: ." +] + }, "EXPRESSION_TYPE_IS_NOT_ORDERABLE" : { "message" : [ "Column expression cannot be sorted because its type is not orderable." @@ -5524,12 +5529,6 @@ "Due to Scala's limited support of tuple, tuple with more than 22 elements are not supported." ] }, - "_LEGACY_ERROR_TEMP_2151" : { -"message" : [ - "Error while decoding: ", - "." -] - }, "_LEGACY_ERROR_TEMP_2152" : { "message" : [ "Error while encoding: ", diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index fda10eceb97..b4ee7358b52 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -551,6 +551,12 @@ The table `` does not support ``. For more details see [EXPECT_VIEW_NOT_TABLE](sql-error-conditions-expect-view-not-table-error-class.html) +### EXPRESSION_DECODING_FAILED + +SQLSTATE: none assigned + +Failed to decode a row to a value of the expressions: ``. + ### EXPRESSION_TYPE_IS_NOT_ORDERABLE SQLSTATE: none assigned diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index bd4d7a3be7f..5396ae5ff70 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -1342,9 +1342,8 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE def expressionDecodingError(e: Exception, expressions: Seq[Expression]): SparkRuntimeException = { new SparkRuntimeException( - errorClass = "_LEGACY_ERROR_TEMP_2151", + errorClass = "EXPRESSION_DECODING_FAILED", messageParameters = Map( -"e" -> e.toString(), "expressions" -> expressions.map( _.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")), cause = e) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala index f4106e65e7c..7f54987ee7e 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala @@ -172,7 +172,7 @@ class EncoderResolutionSuite extends PlanTest { val e = intercept[RuntimeException] { fromRow(InternalRow(new GenericArrayData(Array(1, null } -assert(e.getMessage.contains("Null value appe
[spark] branch master updated (e1a7b84f47b -> ae112e4279f)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from e1a7b84f47b [SPARK-45397][ML][CONNECT] Add array assembler feature transformer add ae112e4279f [SPARK-45116][SQL] Add some comment for param of JdbcDialect `createTable` No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala| 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42881][SQL] Codegen Support for get_json_object
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c2525308330 [SPARK-42881][SQL] Codegen Support for get_json_object c2525308330 is described below commit c252530833097759b1f943ff89b05f22025f0dd0 Author: panbingkun AuthorDate: Wed Oct 11 17:42:48 2023 +0300 [SPARK-42881][SQL] Codegen Support for get_json_object ### What changes were proposed in this pull request? The PR adds Codegen Support for get_json_object. ### Why are the changes needed? Improve codegen coverage and performance. Github benchmark data(https://github.com/panbingkun/spark/actions/runs/4497396473/jobs/7912952710): https://user-images.githubusercontent.com/15246973/227117793-bab38c42-dcc1-46de-a689-25a87b8f3561.png";> Local benchmark data: https://user-images.githubusercontent.com/15246973/227098745-9b360e60-fe84-4419-8b7d-073a0530816a.png";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add new UT. Pass GA. Closes #40506 from panbingkun/json_code_gen. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../sql/catalyst/expressions/jsonExpressions.scala | 121 +--- sql/core/benchmarks/JsonBenchmark-results.txt | 127 +++-- .../org/apache/spark/sql/JsonFunctionsSuite.scala | 28 + .../execution/datasources/json/JsonBenchmark.scala | 15 ++- 4 files changed, 208 insertions(+), 83 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index e7df542ddab..04bc457b66a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -28,7 +28,8 @@ import com.fasterxml.jackson.core.json.JsonReadFeature import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.DataTypeMismatch -import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGenerator, CodegenFallback, ExprCode} +import org.apache.spark.sql.catalyst.expressions.codegen.Block.BlockHelper import org.apache.spark.sql.catalyst.json._ import org.apache.spark.sql.catalyst.trees.TreePattern.{JSON_TO_STRUCT, TreePattern} import org.apache.spark.sql.catalyst.util._ @@ -125,13 +126,7 @@ private[this] object SharedFactory { group = "json_funcs", since = "1.5.0") case class GetJsonObject(json: Expression, path: Expression) - extends BinaryExpression with ExpectsInputTypes with CodegenFallback { - - import com.fasterxml.jackson.core.JsonToken._ - - import PathInstruction._ - import SharedFactory._ - import WriteStyle._ + extends BinaryExpression with ExpectsInputTypes { override def left: Expression = json override def right: Expression = path @@ -140,18 +135,114 @@ case class GetJsonObject(json: Expression, path: Expression) override def nullable: Boolean = true override def prettyName: String = "get_json_object" - @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String]) + @transient + private lazy val evaluator = if (path.foldable) { +new GetJsonObjectEvaluator(path.eval().asInstanceOf[UTF8String]) + } else { +new GetJsonObjectEvaluator() + } override def eval(input: InternalRow): Any = { -val jsonStr = json.eval(input).asInstanceOf[UTF8String] +evaluator.setJson(json.eval(input).asInstanceOf[UTF8String]) +if (!path.foldable) { + evaluator.setPath(path.eval(input).asInstanceOf[UTF8String]) +} +evaluator.evaluate() + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evaluatorClass = classOf[GetJsonObjectEvaluator].getName +val initEvaluator = path.foldable match { + case true if path.eval() != null => +val cachedPath = path.eval().asInstanceOf[UTF8String] +val refCachedPath = ctx.addReferenceObj("cachedPath", cachedPath) +s"new $evaluatorClass($refCachedPath)" + case _ => s"new $evaluatorClass()" +} +val evaluator = ctx.addMutableState(evaluatorClass, "evaluator", + v => s"""$v = $initEvaluator;""", forceInline = true) + +val jsonEval = json.genCode(ctx) +val pathEval = path.genCode(ctx) + +val setJson = + s""" +
[spark] branch master updated: [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new eae5c0e1efc [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat eae5c0e1efc is described below commit eae5c0e1efce83c2bb08754784db070be285285a Author: Jia Fan AuthorDate: Wed Oct 11 19:33:23 2023 +0300 [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat ### What changes were proposed in this pull request? This PR fix CSV/JSON schema inference when timestamps do not match specified timestampFormat will report error. ```scala //eg val csv = spark.read.option("timestampFormat", "-MM-dd'T'HH:mm:ss") .option("inferSchema", true).csv(Seq("2884-06-24T02:45:51.138").toDS()) csv.show() //error Caused by: java.time.format.DateTimeParseException: Text '2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 19 ``` This bug only happend when partition had one row. The data type should be `StringType` not `TimestampType` because the value not match `timestampFormat`. Use csv as eg, in `CSVInferSchema::tryParseTimestampNTZ`, first, use `timestampNTZFormatter.parseWithoutTimeZoneOptional` to inferring return `TimestampType`, if same partition had another row, it will use `tryParseTimestamp` to parse row with user defined `timestampFormat`, then found it can't be convert to timestamp with `timestampFormat`. Finally return `StringType`. But when only one row, we use `timestampNTZFormatter.parseWithoutTimeZoneOptional` to parse normally timestamp not r [...] ### Why are the changes needed? Fix schema inference bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43243 from Hisoka-X/SPARK-45433-inference-mismatch-timestamp-one-row. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala | 9 ++--- .../org/apache/spark/sql/catalyst/json/JsonInferSchema.scala | 8 +--- .../apache/spark/sql/catalyst/csv/CSVInferSchemaSuite.scala| 10 ++ .../apache/spark/sql/catalyst/json/JsonInferSchemaSuite.scala | 8 4 files changed, 29 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala index 51586a0065e..ec01b56f9eb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala @@ -27,7 +27,7 @@ import org.apache.spark.sql.catalyst.expressions.ExprUtils import org.apache.spark.sql.catalyst.util.{DateFormatter, TimestampFormatter} import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT import org.apache.spark.sql.errors.QueryExecutionErrors -import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.{LegacyBehaviorPolicy, SQLConf} import org.apache.spark.sql.types._ class CSVInferSchema(val options: CSVOptions) extends Serializable { @@ -202,8 +202,11 @@ class CSVInferSchema(val options: CSVOptions) extends Serializable { // We can only parse the value as TimestampNTZType if it does not have zone-offset or // time-zone component and can be parsed with the timestamp formatter. // Otherwise, it is likely to be a timestamp with timezone. -if (timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined) { - SQLConf.get.timestampType +val timestampType = SQLConf.get.timestampType +if ((SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY || +timestampType == TimestampNTZType) && +timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined) { + timestampType } else { tryParseTimestamp(field) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala index 5385afe8c93..4123c5290b6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala @@ -32,7 +32,7 @@ import org.apache.spark.sql.catalyst.json.JacksonUtils.nextUntil import org.apache.spark.sql.catalyst.util._ import org.
[spark] branch branch-3.5 updated: [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 7e3ddc1e582 [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat 7e3ddc1e582 is described below commit 7e3ddc1e582a6e4fa96bab608c4c2bbc2c93b449 Author: Jia Fan AuthorDate: Wed Oct 11 19:33:23 2023 +0300 [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat ### What changes were proposed in this pull request? This PR fix CSV/JSON schema inference when timestamps do not match specified timestampFormat will report error. ```scala //eg val csv = spark.read.option("timestampFormat", "-MM-dd'T'HH:mm:ss") .option("inferSchema", true).csv(Seq("2884-06-24T02:45:51.138").toDS()) csv.show() //error Caused by: java.time.format.DateTimeParseException: Text '2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 19 ``` This bug only happend when partition had one row. The data type should be `StringType` not `TimestampType` because the value not match `timestampFormat`. Use csv as eg, in `CSVInferSchema::tryParseTimestampNTZ`, first, use `timestampNTZFormatter.parseWithoutTimeZoneOptional` to inferring return `TimestampType`, if same partition had another row, it will use `tryParseTimestamp` to parse row with user defined `timestampFormat`, then found it can't be convert to timestamp with `timestampFormat`. Finally return `StringType`. But when only one row, we use `timestampNTZFormatter.parseWithoutTimeZoneOptional` to parse normally timestamp not r [...] ### Why are the changes needed? Fix schema inference bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43243 from Hisoka-X/SPARK-45433-inference-mismatch-timestamp-one-row. Authored-by: Jia Fan Signed-off-by: Max Gekk (cherry picked from commit eae5c0e1efce83c2bb08754784db070be285285a) Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala | 9 ++--- .../org/apache/spark/sql/catalyst/json/JsonInferSchema.scala | 8 +--- .../apache/spark/sql/catalyst/csv/CSVInferSchemaSuite.scala| 10 ++ .../apache/spark/sql/catalyst/json/JsonInferSchemaSuite.scala | 8 4 files changed, 29 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala index 51586a0065e..ec01b56f9eb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala @@ -27,7 +27,7 @@ import org.apache.spark.sql.catalyst.expressions.ExprUtils import org.apache.spark.sql.catalyst.util.{DateFormatter, TimestampFormatter} import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT import org.apache.spark.sql.errors.QueryExecutionErrors -import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.{LegacyBehaviorPolicy, SQLConf} import org.apache.spark.sql.types._ class CSVInferSchema(val options: CSVOptions) extends Serializable { @@ -202,8 +202,11 @@ class CSVInferSchema(val options: CSVOptions) extends Serializable { // We can only parse the value as TimestampNTZType if it does not have zone-offset or // time-zone component and can be parsed with the timestamp formatter. // Otherwise, it is likely to be a timestamp with timezone. -if (timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined) { - SQLConf.get.timestampType +val timestampType = SQLConf.get.timestampType +if ((SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY || +timestampType == TimestampNTZType) && +timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined) { + timestampType } else { tryParseTimestamp(field) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala index 5385afe8c93..4123c5290b6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala @@ -32,7 +32,7 @@ import org.apache.spark.sql.catalyst.json.Ja
[spark] branch branch-3.4 updated: [SPARK-45433][SQL][3.4] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new f985d716e164 [SPARK-45433][SQL][3.4] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat f985d716e164 is described below commit f985d716e164885575ec7f36a7782694411da024 Author: Jia Fan AuthorDate: Thu Oct 12 17:09:48 2023 +0500 [SPARK-45433][SQL][3.4] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat ### What changes were proposed in this pull request? This is a backport PR of #43243. Fix the bug of schema inference when timestamps do not match specified timestampFormat. Please check #43243 for detail. ### Why are the changes needed? Fix schema inference bug on 3.4. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. ### Was this patch authored or co-authored using generative AI tooling? Closes #43343 from Hisoka-X/backport-SPARK-45433-inference-schema. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala | 8 ++-- .../org/apache/spark/sql/catalyst/json/JsonInferSchema.scala | 7 +-- .../apache/spark/sql/catalyst/csv/CSVInferSchemaSuite.scala| 10 ++ .../apache/spark/sql/catalyst/json/JsonInferSchemaSuite.scala | 8 4 files changed, 29 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala index 51586a0065e9..dd8ac3985f19 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala @@ -28,6 +28,7 @@ import org.apache.spark.sql.catalyst.util.{DateFormatter, TimestampFormatter} import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy import org.apache.spark.sql.types._ class CSVInferSchema(val options: CSVOptions) extends Serializable { @@ -202,8 +203,11 @@ class CSVInferSchema(val options: CSVOptions) extends Serializable { // We can only parse the value as TimestampNTZType if it does not have zone-offset or // time-zone component and can be parsed with the timestamp formatter. // Otherwise, it is likely to be a timestamp with timezone. -if (timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined) { - SQLConf.get.timestampType +val timestampType = SQLConf.get.timestampType +if ((SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY || +timestampType == TimestampNTZType) && +timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined) { + timestampType } else { tryParseTimestamp(field) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala index 5385afe8c935..7e4767750fd3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala @@ -33,6 +33,7 @@ import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy import org.apache.spark.sql.types._ import org.apache.spark.util.Utils @@ -148,11 +149,13 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable { val bigDecimal = decimalParser(field) DecimalType(bigDecimal.precision, bigDecimal.scale) } +val timestampType = SQLConf.get.timestampType if (options.prefersDecimal && decimalTry.isDefined) { decimalTry.get -} else if (options.inferTimestamp && +} else if (options.inferTimestamp && (SQLConf.get.legacyTimeParserPolicy == + LegacyBehaviorPolicy.LEGACY || timestampType == TimestampNTZType) && timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined) { - SQLConf.get.timestampType + timestampType } else if (options.inferTimestamp && timestampFormatter.parseOptional(field).isDefined) {
[spark] branch master updated: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6994bad5e6e [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect 6994bad5e6e is described below commit 6994bad5e6ea8700d48cbe20e9b406b89925adc7 Author: Jia Fan AuthorDate: Mon Oct 16 13:55:24 2023 +0500 [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect ### What changes were proposed in this pull request? 1. This PR add `dropTable` function to `JdbcDialect`. So user can override dropTable SQL by other JdbcDialect like Neo4J Neo4J Drop case ```sql MATCH (m:Person {name: 'Mark'}) DELETE m ``` 2. Also add `getInsertStatement` for same reason. Neo4J Insert case ```sql MATCH (p:Person {name: 'Jennifer'}) SET p.birthdate = date('1980-01-01') RETURN p ``` Neo4J SQL(in fact named `CQL`) not like normal SQL, but it have JDBC driver. ### Why are the changes needed? Make JdbcDialect more useful ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? exist test Closes #41855 from Hisoka-X/SPARK-44262_JDBCUtils_improve. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../sql/execution/datasources/jdbc/JdbcUtils.scala | 14 +-- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 29 ++ 2 files changed, 35 insertions(+), 8 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index fb9e11df188..f2b84810175 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -78,7 +78,8 @@ object JdbcUtils extends Logging with SQLConfHelper { * Drops a table from the JDBC database. */ def dropTable(conn: Connection, table: String, options: JDBCOptions): Unit = { -executeStatement(conn, options, s"DROP TABLE $table") +val dialect = JdbcDialects.get(options.url) +executeStatement(conn, options, dialect.dropTable(table)) } /** @@ -114,22 +115,19 @@ object JdbcUtils extends Logging with SQLConfHelper { isCaseSensitive: Boolean, dialect: JdbcDialect): String = { val columns = if (tableSchema.isEmpty) { - rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",") + rddSchema.fields } else { // The generated insert statement needs to follow rddSchema's column sequence and // tableSchema's column names. When appending data into some case-sensitive DBMSs like // PostgreSQL/Oracle, we need to respect the existing case-sensitive column names instead of // RDD column names for user convenience. - val tableColumnNames = tableSchema.get.fieldNames rddSchema.fields.map { col => -val normalizedName = tableColumnNames.find(f => conf.resolver(f, col.name)).getOrElse { +tableSchema.get.find(f => conf.resolver(f.name, col.name)).getOrElse { throw QueryCompilationErrors.columnNotFoundInSchemaError(col, tableSchema) } -dialect.quoteIdentifier(normalizedName) - }.mkString(",") + } } -val placeholders = rddSchema.fields.map(_ => "?").mkString(",") -s"INSERT INTO $table ($columns) VALUES ($placeholders)" +dialect.insertIntoTable(table, columns) } /** diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala index 22625523a04..37c378c294c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala @@ -193,6 +193,24 @@ abstract class JdbcDialect extends Serializable with Logging { statement.executeUpdate(s"CREATE TABLE $tableName ($strSchema) $createTableOptions") } + /** + * Returns an Insert SQL statement template for inserting a row into the target table via JDBC + * conn. Use "?" as placeholder for each value to be inserted. + * E.g. `INSERT INTO t ("name", "age", "gender") VALUES (?, ?, ?)` + * + * @param table The name of the table. + * @param fields The fields of the row that will be inserted. + * @return The SQL query to use for insert data into table. + */ + @Since("4.0.0") + def insertIntoTable( + table: String, + fields: Array[StructField]): String = { +
[spark] branch master updated: [SPARK-45035][SQL] Fix ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will report error
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 11e7ea4f11d [SPARK-45035][SQL] Fix ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will report error 11e7ea4f11d is described below commit 11e7ea4f11df71e2942322b01fcaab57dac20c83 Author: Jia Fan AuthorDate: Wed Oct 18 11:06:43 2023 +0500 [SPARK-45035][SQL] Fix ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will report error ### What changes were proposed in this pull request? Fix ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will report error, it would be like: ```log org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4940.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4940.0 (TID 4031) (10.68.177.106 executor 0): com.univocity.parsers.common.TextParsingException: java.lang.IllegalStateException - Error reading from input Parser Configuration: CsvParserSettings: Auto configuration enabled=true Auto-closing enabled=true Autodetect column delimiter=false Autodetect quotes=false Column reordering enabled=true Delimiters for detection=null Empty value= Escape unquoted values=false Header extraction enabled=null Headers=null Ignore leading whitespaces=false Ignore leading whitespaces in quotes=false Ignore trailing whitespaces=false Ignore trailing whitespaces in quotes=false Input buffer size=1048576 Input reading on separate thread=false Keep escape sequences=false Keep quotes=false Length of content displayed on error=1000 Line separator detection enabled=true Maximum number of characters per column=-1 Maximum number of columns=20480 Normalize escaped line separators=true Null value= Number of records to read=all Processor=none Restricting data in exceptions=false RowProcessor error handler=null Selected fields=none Skip bits as whitespace=true Skip empty lines=true Unescaped quote handling=STOP_AT_DELIMITERFormat configuration: CsvFormat: Comment character=# Field delimiter=, Line separator (normalized)=\n Line separator sequence=\n Quote character=" Quote escape character=\ Quote escape escape character=null Internal state when error was thrown: line=0, column=0, record=0 at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:402) at com.univocity.parsers.common.AbstractParser.beginParsing(AbstractParser.java:277) at com.univocity.parsers.common.AbstractParser.beginParsing(AbstractParser.java:843) at org.apache.spark.sql.catalyst.csv.UnivocityParser$$anon$1.(UnivocityParser.scala:463) at org.apache.spark.sql.catalyst.csv.UnivocityParser$.convertStream(UnivocityParser.scala:46... ``` Because multiline CSV/JSON use `BinaryFileRDD` not `FileScanRDD`. Unlike `FileScanRDD`, when met corrupt files will check `ignoreCorruptFiles` config to avoid report IOException, `BinaryFileRDD` will not report error because it return normal `PortableDataStream`. So we should catch it when infer schema in lambda function. Also do same thing for `ignoreMissingFiles`. ### Why are the changes needed? Fix the bug when use mulitline mode with ignoreCorruptFiles/ignoreMissingFiles config. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. ### Was this patch authored or co-authored using generative AI tooling? No Closes #42979 from Hisoka-X/SPARK-45035_csv_multi_line. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../spark/sql/catalyst/json/JsonInferSchema.scala | 18 +-- .../execution/datasources/csv/CSVDataSource.scala | 28 --- .../datasources/CommonFileDataSourceSuite.scala| 28 +++ .../sql/execution/datasources/csv/CSVSuite.scala | 58 +- .../sql/execution/datasources/json/JsonSuite.scala | 46 - 5 files changed, 142 insertions(+), 36 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala index 4123c5290b6..4d04b34876c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spar
[spark] branch master updated: [SPARK-35926][SQL] Add support YearMonthIntervalType for width_bucket
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9d061e3 [SPARK-35926][SQL] Add support YearMonthIntervalType for width_bucket 9d061e3 is described below commit 9d061e3939a021c602c070fc13cef951a8f94c82 Author: PengLei AuthorDate: Fri Oct 15 17:15:50 2021 +0300 [SPARK-35926][SQL] Add support YearMonthIntervalType for width_bucket ### What changes were proposed in this pull request? Support width_bucket(YearMonthIntervalType, YearMonthIntervalType, YearMonthIntervalType, Long), it return long result eg: ``` width_bucket(input_value, min_value, max_value, bucket_nums) width_bucket(INTERVAL '1' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10) It will divides the range between the max_value and min_value into 10 buckets. [ INTERVAL '0' YEAR, INTERVAL '1' YEAR), [ INTERVAL '1' YEAR, INTERVAL '2' YEAR).. [INTERVAL '9' YEAR, INTERVAL '10' YEAR) Then, calculates which bucket the given input_value locate. ``` The function `width_bucket` is introduced from [SPARK-21117](https://issues.apache.org/jira/browse/SPARK-21117) ### Why are the changes needed? [35926](https://issues.apache.org/jira/browse/SPARK-35926) 1. The `WIDTH_BUCKET` function assigns values to buckets (individual segments) in an equiwidth histogram. The ANSI SQL Standard Syntax is like follow: `WIDTH_BUCKET( expression, min, max, buckets)`. [Reference](https://www.oreilly.com/library/view/sql-in-a/9780596155322/re91.html). 2. `WIDTH_BUCKET` just support `Double` at now, Of course, we can cast `Int` to `Double` to use it. But we cloud not cast `YearMonthIntervayType` to `Double`. 3. I think it has a use scenario. eg: Histogram of employee years of service, the `years of service` is a column of `YearMonthIntervalType` dataType. ### Does this PR introduce _any_ user-facing change? Yes. The user can use `width_bucket` with YearMonthIntervalType. ### How was this patch tested? Add ut test Closes #33132 from Peng-Lei/SPARK-35926. Authored-by: PengLei Signed-off-by: Max Gekk --- .../sql/catalyst/expressions/mathExpressions.scala | 33 ++ .../expressions/MathExpressionsSuite.scala | 15 ++ .../test/resources/sql-tests/inputs/interval.sql | 2 ++ .../sql-tests/results/ansi/interval.sql.out| 18 +++- .../resources/sql-tests/results/interval.sql.out | 18 +++- .../org/apache/spark/sql/MathFunctionsSuite.scala | 17 +++ 6 files changed, 96 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala index c14fa72..6c34ed6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.analysis.{FunctionRegistry, TypeCheckResult import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, TypeCheckSuccess} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.expressions.codegen.Block._ -import org.apache.spark.sql.catalyst.util.NumberConverter +import org.apache.spark.sql.catalyst.util.{NumberConverter, TypeUtils} import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String @@ -1613,6 +1613,10 @@ object WidthBucket { 5 > SELECT _FUNC_(-0.9, 5.2, 0.5, 2); 3 + > SELECT _FUNC_(INTERVAL '0' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10); + 1 + > SELECT _FUNC_(INTERVAL '1' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10); + 2 """, since = "3.1.0", group = "math_funcs") @@ -1623,16 +1627,35 @@ case class WidthBucket( numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes with NullIntolerant { - override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def inputTypes: Seq[AbstractDataType] = Seq( +TypeCollection(DoubleType, YearMonthIntervalType), +TypeCollection(DoubleType, YearMonthIntervalType), +TypeCollection(DoubleType, YearMonthIntervalType), +LongType) + + override def checkInputDataTypes(): TypeCheckResult = { +super.checkInputDataTypes() match { + case TypeCheckSuccess => +(value.dataType, minValue.dataType, ma
[spark] branch master updated (c29bb02 -> 21fa3ce)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c29bb02 [SPARK-36965][PYTHON] Extend python test runner by logging out the temp output files add 21fa3ce [SPARK-35925][SQL] Support DayTimeIntervalType in width-bucket function No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/mathExpressions.scala | 12 +--- .../catalyst/expressions/MathExpressionsSuite.scala | 20 .../src/test/resources/sql-tests/inputs/interval.sql | 2 ++ .../sql-tests/results/ansi/interval.sql.out | 18 +- .../resources/sql-tests/results/interval.sql.out | 18 +- 5 files changed, 65 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36928][SQL] Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fd8d5ad [SPARK-36928][SQL] Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray fd8d5ad is described below commit fd8d5ad2140d6405357b908dce2d00a21036dedb Author: PengLei AuthorDate: Thu Oct 28 14:52:41 2021 +0300 [SPARK-36928][SQL] Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray ### What changes were proposed in this pull request? 1. add handle ansi interval type for `get`, `copy` method of ColumnarArray 2. add handle ansi interval type for `get`, `copy` method of ColumnarBatchRow 3. add handle ansi interval type for `get`, `copy` method of ColumnarRow ### Why are the changes needed? [SPARK-36928](https://issues.apache.org/jira/browse/SPARK-36928) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add test case Closes #34421 from Peng-Lei/SPARK-36928. Authored-by: PengLei Signed-off-by: Max Gekk --- .../apache/spark/sql/vectorized/ColumnarArray.java | 6 +- .../spark/sql/vectorized/ColumnarBatchRow.java | 8 +-- .../apache/spark/sql/vectorized/ColumnarRow.java | 8 +-- .../execution/vectorized/ColumnVectorSuite.scala | 69 ++ .../execution/vectorized/ColumnarBatchSuite.scala | 32 ++ 5 files changed, 113 insertions(+), 10 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java index 147dd24..2fb6b3f 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java @@ -57,9 +57,11 @@ public final class ColumnarArray extends ArrayData { return UnsafeArrayData.fromPrimitiveArray(toByteArray()); } else if (dt instanceof ShortType) { return UnsafeArrayData.fromPrimitiveArray(toShortArray()); -} else if (dt instanceof IntegerType || dt instanceof DateType) { +} else if (dt instanceof IntegerType || dt instanceof DateType +|| dt instanceof YearMonthIntervalType) { return UnsafeArrayData.fromPrimitiveArray(toIntArray()); -} else if (dt instanceof LongType || dt instanceof TimestampType) { +} else if (dt instanceof LongType || dt instanceof TimestampType +|| dt instanceof DayTimeIntervalType) { return UnsafeArrayData.fromPrimitiveArray(toLongArray()); } else if (dt instanceof FloatType) { return UnsafeArrayData.fromPrimitiveArray(toFloatArray()); diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java index c6b7287e7..8c32d5c 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java @@ -52,9 +52,9 @@ public final class ColumnarBatchRow extends InternalRow { row.setByte(i, getByte(i)); } else if (dt instanceof ShortType) { row.setShort(i, getShort(i)); -} else if (dt instanceof IntegerType) { +} else if (dt instanceof IntegerType || dt instanceof YearMonthIntervalType) { row.setInt(i, getInt(i)); -} else if (dt instanceof LongType) { +} else if (dt instanceof LongType || dt instanceof DayTimeIntervalType) { row.setLong(i, getLong(i)); } else if (dt instanceof FloatType) { row.setFloat(i, getFloat(i)); @@ -151,9 +151,9 @@ public final class ColumnarBatchRow extends InternalRow { return getByte(ordinal); } else if (dataType instanceof ShortType) { return getShort(ordinal); -} else if (dataType instanceof IntegerType) { +} else if (dataType instanceof IntegerType || dataType instanceof YearMonthIntervalType) { return getInt(ordinal); -} else if (dataType instanceof LongType) { +} else if (dataType instanceof LongType || dataType instanceof DayTimeIntervalType) { return getLong(ordinal); } else if (dataType instanceof FloatType) { return getFloat(ordinal); diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java index 4b9d3c5..da4b242 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java @@ -61,9 +61,9 @@ public final class ColumnarRow extends InternalRow { row.setByte(i, getByte(i
[spark] branch master updated: [SPARK-37138][SQL] Support ANSI Interval types in ApproxCountDistinctForIntervals/ApproximatePercentile/Percentile
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 08123a3 [SPARK-37138][SQL] Support ANSI Interval types in ApproxCountDistinctForIntervals/ApproximatePercentile/Percentile 08123a3 is described below commit 08123a3795683238352e5bf55452de381349fdd9 Author: Angerszh AuthorDate: Sat Oct 30 20:03:20 2021 +0300 [SPARK-37138][SQL] Support ANSI Interval types in ApproxCountDistinctForIntervals/ApproximatePercentile/Percentile ### What changes were proposed in this pull request? Support Ansi Interval types in the agg expressions: - ApproxCountDistinctForIntervals - ApproximatePercentile - Percentile ### Why are the changes needed? To improve user experience with Spark SQL. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added new UT. Closes #34412 from AngersZh/SPARK-37138. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../ApproxCountDistinctForIntervals.scala | 13 +++--- .../aggregate/ApproximatePercentile.scala | 32 -- .../expressions/aggregate/Percentile.scala | 26 +--- .../ApproxCountDistinctForIntervalsSuite.scala | 6 ++- .../expressions/aggregate/PercentileSuite.scala| 8 ++-- ...ApproxCountDistinctForIntervalsQuerySuite.scala | 28 + .../sql/ApproximatePercentileQuerySuite.scala | 22 +- .../apache/spark/sql/PercentileQuerySuite.scala| 49 ++ 8 files changed, 153 insertions(+), 31 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala index a7e9a22..f3bf251 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala @@ -61,7 +61,8 @@ case class ApproxCountDistinctForIntervals( } override def inputTypes: Seq[AbstractDataType] = { -Seq(TypeCollection(NumericType, TimestampType, DateType, TimestampNTZType), ArrayType) +Seq(TypeCollection(NumericType, TimestampType, DateType, TimestampNTZType, + YearMonthIntervalType, DayTimeIntervalType), ArrayType) } // Mark as lazy so that endpointsExpression is not evaluated during tree transformation. @@ -79,14 +80,16 @@ case class ApproxCountDistinctForIntervals( TypeCheckFailure("The endpoints provided must be constant literals") } else { endpointsExpression.dataType match { -case ArrayType(_: NumericType | DateType | TimestampType | TimestampNTZType, _) => +case ArrayType(_: NumericType | DateType | TimestampType | TimestampNTZType | + _: AnsiIntervalType, _) => if (endpoints.length < 2) { TypeCheckFailure("The number of endpoints must be >= 2 to construct intervals") } else { TypeCheckSuccess } case _ => - TypeCheckFailure("Endpoints require (numeric or timestamp or date) type") + TypeCheckFailure("Endpoints require (numeric or timestamp or date or timestamp_ntz or " + +"interval year to month or interval day to second) type") } } } @@ -120,9 +123,9 @@ case class ApproxCountDistinctForIntervals( val doubleValue = child.dataType match { case n: NumericType => n.numeric.toDouble(value.asInstanceOf[n.InternalType]) -case _: DateType => +case _: DateType | _: YearMonthIntervalType => value.asInstanceOf[Int].toDouble -case TimestampType | TimestampNTZType => +case TimestampType | TimestampNTZType | _: DayTimeIntervalType => value.asInstanceOf[Long].toDouble } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala index 8cce79c..0dcb906 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala @@ -49,15 +49,16 @@ import org.apache.spark.sql.types._ * yields better accuracy, the default value is * DEFAULT_PERCENTILE_ACCURACY. */ +// scalas
[spark] branch master updated (13c372d -> d43a678)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 13c372d [SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default add d43a678 [SPARK-37161][SQL] RowToColumnConverter support AnsiIntervalType No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/Columnar.scala | 4 +-- .../execution/vectorized/ColumnarBatchSuite.scala | 37 ++ 2 files changed, 39 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37176][SQL] Sync JsonInferSchema#infer method's exception handle logic with JacksonParser#parse method
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ec6a3ae [SPARK-37176][SQL] Sync JsonInferSchema#infer method's exception handle logic with JacksonParser#parse method ec6a3ae is described below commit ec6a3ae6dff1dc9c63978ae14a1793ccd771 Author: Xianjin YE AuthorDate: Tue Nov 2 12:40:09 2021 +0300 [SPARK-37176][SQL] Sync JsonInferSchema#infer method's exception handle logic with JacksonParser#parse method ### What changes were proposed in this pull request? Change `JsonInferSchema#infer`'s exception handle logic to be aligned with `JacksonParser#parse` ### Why are the changes needed? To reduce behavior inconsistency, users can have the same expectation for schema infer and json parse when dealing with some malformed input. ### Does this PR introduce _any_ user-facing change? Yes. Before this patch, json's inferring schema could be failed for some malformed input but succeeded when parsing. After this patch, they have the same exception handle logic. ### How was this patch tested? Added one new test and modify one exist test to cover the new case. Closes #34455 from advancedxy/SPARK-37176. Authored-by: Xianjin YE Signed-off-by: Max Gekk --- .../spark/sql/catalyst/json/JsonInferSchema.scala | 33 +++- .../test/resources/test-data/malformed_utf8.json | 3 ++ .../sql/execution/datasources/json/JsonSuite.scala | 35 ++ 3 files changed, 63 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala index 3b17cde..3b62b16 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala @@ -17,6 +17,8 @@ package org.apache.spark.sql.catalyst.json +import java.io.CharConversionException +import java.nio.charset.MalformedInputException import java.util.Comparator import scala.util.control.Exception.allCatch @@ -45,6 +47,18 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable { legacyFormat = FAST_DATE_FORMAT, isParsing = true) + private def handleJsonErrorsByParseMode(parseMode: ParseMode, + columnNameOfCorruptRecord: String, e: Throwable): Option[StructType] = { +parseMode match { + case PermissiveMode => +Some(StructType(Seq(StructField(columnNameOfCorruptRecord, StringType + case DropMalformedMode => +None + case FailFastMode => +throw QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e) +} + } + /** * Infer the type of a collection of json records in three stages: * 1. Infer the type of each record @@ -68,14 +82,17 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable { Some(inferField(parser)) } } catch { - case e @ (_: RuntimeException | _: JsonProcessingException) => parseMode match { -case PermissiveMode => - Some(StructType(Seq(StructField(columnNameOfCorruptRecord, StringType -case DropMalformedMode => - None -case FailFastMode => - throw QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e) - } + case e @ (_: RuntimeException | _: JsonProcessingException | +_: MalformedInputException) => +handleJsonErrorsByParseMode(parseMode, columnNameOfCorruptRecord, e) + case e: CharConversionException if options.encoding.isEmpty => +val msg = + """JSON parser cannot handle a character in its input. +|Specifying encoding as an input option explicitly might help to resolve the issue. +|""".stripMargin + e.getMessage +val wrappedCharException = new CharConversionException(msg) +wrappedCharException.initCause(e) +handleJsonErrorsByParseMode(parseMode, columnNameOfCorruptRecord, wrappedCharException) } }.reduceOption(typeMerger).toIterator } diff --git a/sql/core/src/test/resources/test-data/malformed_utf8.json b/sql/core/src/test/resources/test-data/malformed_utf8.json new file mode 100644 index 000..c57eb43 --- /dev/null +++ b/sql/core/src/test/resources/test-data/malformed_utf8.json @@ -0,0 +1,3 @@ +{"a": 1} +{"a": 1} +� \ No newline at end of file diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/dataso
[spark] branch master updated: [SPARK-24774][SQL][FOLLOWUP] Remove unused code in SchemaConverters.scala
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 59c55dd [SPARK-24774][SQL][FOLLOWUP] Remove unused code in SchemaConverters.scala 59c55dd is described below commit 59c55dd4c6f7772ef7949653679a2b76211788e8 Author: Gengliang Wang AuthorDate: Wed Nov 3 08:43:25 2021 +0300 [SPARK-24774][SQL][FOLLOWUP] Remove unused code in SchemaConverters.scala ### What changes were proposed in this pull request? As MaxGekk pointed out in https://github.com/apache/spark/pull/22037/files#r741373793, there is some unused code in SchemaConverters.scala. The UUID generator was for generating `fix` avro field names but we figure out a better solution during PR review. This PR is to remove the dead code. ### Why are the changes needed? Code clean up ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT. Closes #34472 from gengliangwang/SPARK-24774-followup. Authored-by: Gengliang Wang Signed-off-by: Max Gekk --- .../src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala | 4 1 file changed, 4 deletions(-) diff --git a/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala b/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala index 1c9b06b..347364c 100644 --- a/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala +++ b/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala @@ -18,14 +18,12 @@ package org.apache.spark.sql.avro import scala.collection.JavaConverters._ -import scala.util.Random import org.apache.avro.{LogicalTypes, Schema, SchemaBuilder} import org.apache.avro.LogicalTypes.{Date, Decimal, LocalTimestampMicros, LocalTimestampMillis, TimestampMicros, TimestampMillis} import org.apache.avro.Schema.Type._ import org.apache.spark.annotation.DeveloperApi -import org.apache.spark.sql.catalyst.util.RandomUUIDGenerator import org.apache.spark.sql.types._ import org.apache.spark.sql.types.Decimal.minBytesForPrecision @@ -35,8 +33,6 @@ import org.apache.spark.sql.types.Decimal.minBytesForPrecision */ @DeveloperApi object SchemaConverters { - private lazy val uuidGenerator = RandomUUIDGenerator(new Random().nextLong()) - private lazy val nullSchema = Schema.create(Schema.Type.NULL) /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37261][SQL] Allow adding partitions with ANSI intervals in DSv2
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2a1267a [SPARK-37261][SQL] Allow adding partitions with ANSI intervals in DSv2 2a1267a is described below commit 2a1267aeb75bf838c74d1cf274aa258be060c17b Author: Max Gekk AuthorDate: Wed Nov 10 15:21:33 2021 +0300 [SPARK-37261][SQL] Allow adding partitions with ANSI intervals in DSv2 ### What changes were proposed in this pull request? In the PR, I propose to skip checking of ANSI interval types while creating or writing to a table using V2 catalogs. As the consequence of that, users can creating tables in V2 catalogs partitioned by ANSI interval columns (the legacy intervals of `CalendarIntervalType` are still prohibited). Also this PR adds new test which checks: 1. Adding new partition with ANSI intervals via `ALTER TABLE .. ADD PARTITION` 2. INSERT INTO a table partitioned by ANSI intervals for V1/V2 In-Memory catalogs (skips V1 Hive external catalog). ### Why are the changes needed? To allow users saving of ANSI intervals as partition values using DSv2. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test for V1/V2 In-Memory and V1 Hive external catalogs: ``` $ build/sbt "test:testOnly org.apache.spark.sql.execution.command.v1.AlterTableAddPartitionSuite" $ build/sbt "test:testOnly org.apache.spark.sql.execution.command.v2.AlterTableAddPartitionSuite" $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite" $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DataSourceV2SQLSuite" ``` Closes #34537 from MaxGekk/alter-table-ansi-interval. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../sql/catalyst/analysis/CheckAnalysis.scala | 6 ++-- .../apache/spark/sql/catalyst/util/TypeUtils.scala | 4 +-- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 16 + .../command/AlterTableAddPartitionSuiteBase.scala | 40 +- 4 files changed, 54 insertions(+), 12 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 5bf37a2..1a105ad 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -464,10 +464,12 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { failAnalysis(s"Invalid partitioning: ${badReferences.mkString(", ")}") } -create.tableSchema.foreach(f => TypeUtils.failWithIntervalType(f.dataType)) +create.tableSchema.foreach(f => + TypeUtils.failWithIntervalType(f.dataType, forbidAnsiIntervals = false)) case write: V2WriteCommand if write.resolved => -write.query.schema.foreach(f => TypeUtils.failWithIntervalType(f.dataType)) +write.query.schema.foreach(f => + TypeUtils.failWithIntervalType(f.dataType, forbidAnsiIntervals = false)) case alter: AlterTableCommand => checkAlterTableCommand(alter) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala index cba3a9a..144508c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala @@ -98,8 +98,8 @@ object TypeUtils { case _ => false } - def failWithIntervalType(dataType: DataType): Unit = { -invokeOnceForInterval(dataType, forbidAnsiIntervals = true) { + def failWithIntervalType(dataType: DataType, forbidAnsiIntervals: Boolean = true): Unit = { +invokeOnceForInterval(dataType, forbidAnsiIntervals) { throw QueryCompilationErrors.cannotUseIntervalTypeInTableSchemaError() } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala index f03792f..499638c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala @@ -340,13 +340,15 @@ class DataSourceV2SQLSuite } test("CTAS/RTAS: invalid schema if has interval type") { -Seq("CREATE"
[spark] branch master updated (9191632 -> a4b8a8d)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9191632 [SPARK-36825][FOLLOWUP] Move the test code from `ParquetIOSuite` to `ParquetFileFormatSuite` add a4b8a8d [SPARK-37294][SQL][TESTS] Check inserting of ANSI intervals into a table partitioned by the interval columns No new revisions were added by this update. Summary of changes: .../spark/sql/connector/DataSourceV2SQLSuite.scala | 35 +- .../org/apache/spark/sql/sources/InsertSuite.scala | 34 + 2 files changed, 68 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. REPLACE COLUMNS`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 71f4ee3 [SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. REPLACE COLUMNS` 71f4ee3 is described below commit 71f4ee38c71734128c5653b8f18a7d0bf1014b6b Author: Max Gekk AuthorDate: Fri Nov 12 17:23:36 2021 +0300 [SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. REPLACE COLUMNS` ### What changes were proposed in this pull request? In the PR, I propose to allow ANSI intervals: year-month and day-time intervals in the `ALTER TABLE .. REPLACE COLUMNS` command for tables in v2 catalogs (v1 catalogs don't support the command). Also added unified test suite to migrate related tests in the future. ### Why are the changes needed? To improve user experience with Spark SQL. After the changes, users can replace columns with ANSI intervals instead of removing and adding such columns. ### Does this PR introduce _any_ user-facing change? In some sense, yes. After the changes, the command doesn't output any error message. ### How was this patch tested? By running new test suite: ``` $ build/sbt "test:testOnly *AlterTableReplaceColumnsSuite" ``` Closes #34571 from MaxGekk/add-replace-ansi-interval-col. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../plans/logical/v2AlterTableCommands.scala | 2 +- .../AlterTableReplaceColumnsSuiteBase.scala| 54 ++ .../command/v2/AlterTableReplaceColumnsSuite.scala | 28 +++ 3 files changed, 83 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala index 302a810..2eb828e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala @@ -134,7 +134,7 @@ case class ReplaceColumns( table: LogicalPlan, columnsToAdd: Seq[QualifiedColType]) extends AlterTableCommand { columnsToAdd.foreach { c => -TypeUtils.failWithIntervalType(c.dataType) +TypeUtils.failWithIntervalType(c.dataType, forbidAnsiIntervals = false) } override lazy val resolved: Boolean = table.resolved && columnsToAdd.forall(_.resolved) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableReplaceColumnsSuiteBase.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableReplaceColumnsSuiteBase.scala new file mode 100644 index 000..fed4076 --- /dev/null +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableReplaceColumnsSuiteBase.scala @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import java.time.{Duration, Period} + +import org.apache.spark.sql.{QueryTest, Row} + +/** + * This base suite contains unified tests for the `ALTER TABLE .. REPLACE COLUMNS` command that + * check the V2 table catalog. The tests that cannot run for all supported catalogs are + * located in more specific test suites: + * + * - V2 table catalog tests: + * `org.apache.spark.sql.execution.command.v2.AlterTableReplaceColumnsSuite` + */ +trait AlterTableReplaceColumnsSuiteBase extends QueryTest with DDLCommandTestUtils { + override val command = "ALTER TABLE .. REPLACE COLUMNS" + + test("SPARK-37304: Replace columns by ANSI intervals") { +withNamespaceAndTable("ns", "tbl") { t => + sql(s"CREATE TABLE $t (ym INTERVAL MONTH, dt INTERVAL HOUR, data STRING) $defaultUsing") + // TODO(SPARK-37303): Uncomment the command below after REPLACE COLUMNS is fixed + // sql(s"INSERT INTO $t SELECT INTERVAL '1' MONTH, INTERVAL &
[spark] branch master updated: [SPARK-37332][SQL] Allow ANSI intervals in `ALTER TABLE .. ADD COLUMNS`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0f20678 [SPARK-37332][SQL] Allow ANSI intervals in `ALTER TABLE .. ADD COLUMNS` 0f20678 is described below commit 0f20678fc50aaf26359d9751fe96b15dc2e12540 Author: Max Gekk AuthorDate: Tue Nov 16 10:30:11 2021 +0300 [SPARK-37332][SQL] Allow ANSI intervals in `ALTER TABLE .. ADD COLUMNS` ### What changes were proposed in this pull request? In the PR, I propose to allow ANSI intervals: year-month and day-time intervals in the `ALTER TABLE .. ADD COLUMNS` command for tables in v1 and v2 In-Memory catalogs. Also added an unified test suite to migrate related tests in the future. ### Why are the changes needed? To improve user experience with Spark SQL. After the changes, users will be able to add columns with ANSI intervals instead of dropping and creating new table. ### Does this PR introduce _any_ user-facing change? In some sense, yes. After the changes, the command doesn't output any error message. ### How was this patch tested? By running new test suite: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableAddColumnsSuite" $ build/sbt -Phive-2.3 "test:testOnly *HiveDDLSuite" ``` Closes #34600 from MaxGekk/add-columns-ansi-intervals. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../sql/catalyst/analysis/CheckAnalysis.scala | 6 +-- .../plans/logical/v2AlterTableCommands.scala | 2 +- .../apache/spark/sql/catalyst/util/TypeUtils.scala | 4 +- .../command/AlterTableAddColumnsSuiteBase.scala| 53 ++ .../command/v1/AlterTableAddColumnsSuite.scala | 38 .../command/v2/AlterTableAddColumnsSuite.scala | 28 .../spark/sql/hive/execution/HiveDDLSuite.scala| 5 -- .../command/AlterTableAddColumnsSuite.scala| 46 +++ 8 files changed, 170 insertions(+), 12 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 1a105ad..5bf37a2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -464,12 +464,10 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { failAnalysis(s"Invalid partitioning: ${badReferences.mkString(", ")}") } -create.tableSchema.foreach(f => - TypeUtils.failWithIntervalType(f.dataType, forbidAnsiIntervals = false)) +create.tableSchema.foreach(f => TypeUtils.failWithIntervalType(f.dataType)) case write: V2WriteCommand if write.resolved => -write.query.schema.foreach(f => - TypeUtils.failWithIntervalType(f.dataType, forbidAnsiIntervals = false)) +write.query.schema.foreach(f => TypeUtils.failWithIntervalType(f.dataType)) case alter: AlterTableCommand => checkAlterTableCommand(alter) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala index 2eb828e..302a810 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala @@ -134,7 +134,7 @@ case class ReplaceColumns( table: LogicalPlan, columnsToAdd: Seq[QualifiedColType]) extends AlterTableCommand { columnsToAdd.foreach { c => -TypeUtils.failWithIntervalType(c.dataType, forbidAnsiIntervals = false) +TypeUtils.failWithIntervalType(c.dataType) } override lazy val resolved: Boolean = table.resolved && columnsToAdd.forall(_.resolved) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala index 144508c..729a26b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala @@ -98,8 +98,8 @@ object TypeUtils { case _ => false } - def failWithIntervalType(dataType: DataType, forbidAnsiIntervals: Boolean = true): Unit = { -invokeOnceForInterval(dataType, forbidAnsiIntervals) { + def failWithIntervalType(dataType: DataType): Unit = { +
[spark] branch master updated (a6ca481 -> 7484c1b)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a6ca481 [SPARK-36346][SQL][FOLLOWUP] Rename `withAllOrcReaders` to `withAllNativeOrcReaders` add 7484c1b [SPARK-37468][SQL] Support ANSI intervals and TimestampNTZ for UnionEstimation No new revisions were added by this update. Summary of changes: .../logical/statsEstimation/UnionEstimation.scala | 8 +++- .../statsEstimation/UnionEstimationSuite.scala | 24 +++--- 2 files changed, 28 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f7be024 -> ce1f97f)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f7be024 [SPARK-37480][K8S][DOC] Sync Kubernetes configuration to latest in running-on-k8s.md add ce1f97f [SPARK-37326][SQL] Support TimestampNTZ in CSV data source No new revisions were added by this update. Summary of changes: docs/sql-data-sources-csv.md | 12 +- .../spark/sql/catalyst/csv/CSVInferSchema.scala| 24 +++ .../apache/spark/sql/catalyst/csv/CSVOptions.scala | 4 + .../sql/catalyst/csv/UnivocityGenerator.scala | 2 +- .../spark/sql/catalyst/csv/UnivocityParser.scala | 4 +- .../spark/sql/catalyst/util/DateTimeUtils.scala| 32 ++- .../sql/catalyst/util/TimestampFormatter.scala | 36 +++- .../spark/sql/errors/QueryExecutionErrors.scala| 8 +- .../sql/catalyst/util/DateTimeUtilsSuite.scala | 12 ++ .../org/apache/spark/sql/CsvFunctionsSuite.scala | 11 ++ .../sql/execution/datasources/csv/CSVSuite.scala | 216 - 11 files changed, 331 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37508][SQL] Add CONTAINS() string function
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 710120a [SPARK-37508][SQL] Add CONTAINS() string function 710120a is described below commit 710120a499d6082bcec6b65ad1f8dbe4789f4bd9 Author: Angerszh AuthorDate: Wed Dec 1 12:57:22 2021 +0300 [SPARK-37508][SQL] Add CONTAINS() string function ### What changes were proposed in this pull request? Add `CONTAINS` string function. | function| arguments | Returns | |---|---|---| | CONTAINS( left , right) | left: String, right: String | Returns a BOOLEAN. The value is True if right is found inside left. Returns NULL if either input expression is NULL. Otherwise, returns False.| ### Why are the changes needed? contains() is a common convenient function supported by a number of database systems: - https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#contains_substr - https://docs.snowflake.com/en/sql-reference/functions/contains.html Support of the function can make the migration from other systems to Spark SQL easier. ### Does this PR introduce _any_ user-facing change? User can use `contains(left, right)`: | Left | Right | Return | |--|:-:|--:| | null | "Spark SQL" | null | | "Spark SQL" | null | null | | null | null | null | | "Spark SQL" | "Spark" | true | | "Spark SQL" | "k SQL" | true | | "Spark SQL" | "SPARK" | false | ### How was this patch tested? Added UT Closes #34761 from AngersZh/SPARK-37508. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../catalyst/expressions/stringExpressions.scala | 17 .../expressions/StringExpressionsSuite.scala | 9 .../sql-functions/sql-expression-schema.md | 3 +- .../sql-tests/inputs/string-functions.sql | 10 - .../results/ansi/string-functions.sql.out | 50 +- .../sql-tests/results/string-functions.sql.out | 50 +- 7 files changed, 136 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index 0668460..b2788f8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -455,6 +455,7 @@ object FunctionRegistry { expression[Ascii]("ascii"), expression[Chr]("char", true), expression[Chr]("chr"), +expression[Contains]("contains"), expression[Base64]("base64"), expression[BitLength]("bit_length"), expression[Length]("char_length", true), diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala index 2b997da..959c834 100755 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala @@ -465,6 +465,23 @@ abstract class StringPredicate extends BinaryExpression /** * A function that returns true if the string `left` contains the string `right`. */ +@ExpressionDescription( + usage = """ +_FUNC_(expr1, expr2) - Returns a boolean value if expr2 is found inside expr1. +Returns NULL if either input expression is NULL. + """, + examples = """ +Examples: + > SELECT _FUNC_('Spark SQL', 'Spark'); + true + > SELECT _FUNC_('Spark SQL', 'SPARK'); + false + > SELECT _FUNC_('Spark SQL', null); + NULL + """, + since = "3.3.0", + group = "string_funcs" +) case class Contains(left: Expression, right: Expression) extends StringPredicate { override def compare(l: UTF8String, r: UTF8String): Boolean = l.contains(r) override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala index 823ce77..443a94b 100644 --- a/sql/catalyst/src
[spark] branch master updated: [SPARK-37360][SQL] Support TimestampNTZ in JSON data source
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4f36978 [SPARK-37360][SQL] Support TimestampNTZ in JSON data source 4f36978 is described below commit 4f369789bd5d6cc81a85fe01a37e0ae90cbdeb6c Author: Ivan Sadikov AuthorDate: Mon Dec 6 13:24:46 2021 +0500 [SPARK-37360][SQL] Support TimestampNTZ in JSON data source ### What changes were proposed in this pull request? This PR adds support for TimestampNTZ type in the JSON data source. Most of the functionality has already been added, this patch verifies that writes + reads work for TimestampNTZ type and adds schema inference depending on the timestamp value format written. The following applies: - If there is a mixture of `TIMESTAMP_NTZ` and `TIMESTAMP_LTZ` values, use `TIMESTAMP_LTZ`. - If there are only `TIMESTAMP_NTZ` values, resolve using the the default timestamp type configured with `spark.sql.timestampType`. In addition, I introduced a new JSON option `timestampNTZFormat` which is similar to `timestampFormat` but it allows to configure read/write pattern for `TIMESTAMP_NTZ` types. It is basically a copy of timestamp pattern but without timezone. ### Why are the changes needed? The PR fixes issues when writing and reading TimestampNTZ to and from JSON. ### Does this PR introduce _any_ user-facing change? Previously, JSON data source would infer timestamp values as `TimestampType` when reading a JSON file. Now, the data source would infer the timestamp value type based on the format (with or without timezone) and default timestamp type based on `spark.sql.timestampType`. A new JSON option `timestampNTZFormat` is added to control the way values are formatted during writes or parsed during reads. ### How was this patch tested? I extended `JsonSuite` with a few unit tests to verify that write-read roundtrip works for `TIMESTAMP_NTZ` and `TIMESTAMP_LTZ` values. Closes #34638 from sadikovi/timestamp-ntz-support-json. Authored-by: Ivan Sadikov Signed-off-by: Max Gekk --- docs/sql-data-sources-json.md | 10 +- .../spark/sql/catalyst/json/JSONOptions.scala | 9 +- .../spark/sql/catalyst/json/JacksonGenerator.scala | 2 +- .../spark/sql/catalyst/json/JacksonParser.scala| 4 +- .../spark/sql/catalyst/json/JsonInferSchema.scala | 12 ++ .../sql/execution/datasources/json/JsonSuite.scala | 194 - 6 files changed, 216 insertions(+), 15 deletions(-) diff --git a/docs/sql-data-sources-json.md b/docs/sql-data-sources-json.md index 5e3bd2b..b5f27aa 100644 --- a/docs/sql-data-sources-json.md +++ b/docs/sql-data-sources-json.md @@ -9,9 +9,9 @@ license: | The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -197,6 +197,12 @@ Data source options of JSON can be set via: read/write +timestampNTZFormat +-MM-dd'T'HH:mm:ss[.SSS] +Sets the string that indicates a timestamp without timezone format. Custom date formats follow the formats at https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime Patterns. This applies to timestamp without timezone type, note that zone-offset and time-zone components are not supported when writing or reading this data type. +read/write + + multiLine false Parse one record, which may span multiple lines, per file. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala index 029c014..e801912 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala @@ -106,6 +106,10 @@ private[sql] class JSONOptions( s"${DateFormatter.defaultPattern}'T'HH:mm:ss[.SSS][XXX]" }) + val timestampNTZFormatInRead: Option[String] = parameters.get("timestampNTZFormat") + val timestampNTZFormatInWrite: String = +parameters.getOrElse("timestampNTZFormat", s"${DateFormatter.defaultPattern}'T'HH:mm:ss[.SSS]") + val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false) /*
[spark] branch master updated (72669b5 -> 0b959b5)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 72669b5 [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3 add 0b959b5 [SPARK-37552][SQL] Add the `convert_timezone()` function No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../catalyst/expressions/datetimeExpressions.scala | 53 ++ .../spark/sql/catalyst/util/DateTimeUtils.scala| 17 +++ .../expressions/DateExpressionsSuite.scala | 40 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 24 ++ .../sql-functions/sql-expression-schema.md | 3 +- .../resources/sql-tests/inputs/timestamp-ntz.sql | 2 + .../sql-tests/results/timestamp-ntz.sql.out| 10 +++- 8 files changed, 148 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5edd959 -> c7dd2d5)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5edd959 [SPARK-37561][SQL] Avoid loading all functions when obtaining hive's DelegationToken add c7dd2d5 [SPARK-36137][SQL][FOLLOWUP] Correct the config key in error msg No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fba219c -> 5e4d664)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fba219c [SPARK-37622][K8S] Support K8s executor rolling policy add 5e4d664 [SPARK-37591][SQL] Support the GCM mode by `aes_encrypt()`/`aes_decrypt()` No new revisions were added by this update. Summary of changes: .../catalyst/expressions/ExpressionImplUtils.java | 49 -- .../spark/sql/catalyst/expressions/misc.scala | 20 + .../sql-functions/sql-expression-schema.md | 2 +- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 1 + .../org/apache/spark/sql/MiscFunctionsSuite.scala | 19 + 5 files changed, 70 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6a59fba [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings 6a59fba is described below commit 6a59fba248359fb2614837fe8781dc63ac8fdc4c Author: wayneguow AuthorDate: Tue Dec 14 11:26:34 2021 +0300 [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings ### What changes were proposed in this pull request? Fix the bug that null values are saved as quoted empty strings "" (as the same as empty strings) rather than nothing by default csv settings since Spark 2.4. ### Why are the changes needed? This is an unexpected bug, if don't fix it, we still can't distinguish null values and empty strings in saved csv files. As mentioned in [spark sql migration guide](https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24)(2.3=>2.4), empty strings are saved as quoted empty string "", null values as saved as nothing since Spark 2.4. > Since Spark 2.4, empty strings are saved as quoted empty strings "". In version 2.3 and earlier, empty strings are equal to null values and do not reflect to any characters in saved CSV files. For example, the row of "a", null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as a,,"",1. To restore the previous behavior, set the CSV option emptyValue to empty (not quoted) string. But actually, we found that null values are also saved as quoted empty strings "" as the same as empty strings. For codes follows: ```scala Seq(("Tesla", null.asInstanceOf[String], "")) .toDF("make", "comment", "blank") .coalesce(1) .write.csv(path) ``` actual results: >Tesla,"","" expected results: >Tesla,,"" ### Does this PR introduce _any_ user-facing change? Yes, if this bug has been fixed, the output of null values would been changed to nothing rather than quoted empty strings "". But, users can set nullValue to "\\"\\""(same as emptyValueInWrite's default value) to restore the previous behavior since 2.4. ### How was this patch tested? Adding a test case. Closes #34853 from wayneguow/SPARK-37575. Lead-authored-by: wayneguow Co-authored-by: Wayne Guo Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/csv/UnivocityGenerator.scala | 2 -- .../spark/sql/execution/datasources/csv/CSVSuite.scala | 13 - 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala index 10cccd5..9d65824 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala @@ -94,8 +94,6 @@ class UnivocityGenerator( while (i < row.numFields) { if (!row.isNullAt(i)) { values(i) = valueConverters(i).apply(row, i) - } else { -values(i) = options.nullValue } i += 1 } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala index 8c8079f..c7328d9 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala @@ -805,6 +805,17 @@ abstract class CSVSuite } } + test("SPARK-37575: null values should be saved as nothing rather than " + +"quoted empty Strings \"\" with default settings") { +withTempPath { path => + Seq(("Tesla", null: String, "")) +.toDF("make", "comment", "blank") +.write +.csv(path.getCanonicalPath) + checkAnswer(spark.read.text(path.getCanonicalPath), Row("Tesla,,\"\"")) +} + } + test("save csv with compression codec option") { withTempDir { dir => val csvDir = new File(dir, "csv").getCanonicalPath @@ -1769,7 +1780,7 @@ abstract class CSVSuite (1, "John Doe"), (2, "-"), (3, "-"), -(4, "-") +(4, null) ).toDF("id", "name") checkAnswer(computed, expected) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 62e4202 [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings 62e4202 is described below commit 62e4202b65d76b05f9f9a15819a631524c6e7985 Author: wayneguow AuthorDate: Tue Dec 14 11:26:34 2021 +0300 [SPARK-37575][SQL] null values should be saved as nothing rather than quoted empty Strings "" by default settings ### What changes were proposed in this pull request? Fix the bug that null values are saved as quoted empty strings "" (as the same as empty strings) rather than nothing by default csv settings since Spark 2.4. ### Why are the changes needed? This is an unexpected bug, if don't fix it, we still can't distinguish null values and empty strings in saved csv files. As mentioned in [spark sql migration guide](https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24)(2.3=>2.4), empty strings are saved as quoted empty string "", null values as saved as nothing since Spark 2.4. > Since Spark 2.4, empty strings are saved as quoted empty strings "". In version 2.3 and earlier, empty strings are equal to null values and do not reflect to any characters in saved CSV files. For example, the row of "a", null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as a,,"",1. To restore the previous behavior, set the CSV option emptyValue to empty (not quoted) string. But actually, we found that null values are also saved as quoted empty strings "" as the same as empty strings. For codes follows: ```scala Seq(("Tesla", null.asInstanceOf[String], "")) .toDF("make", "comment", "blank") .coalesce(1) .write.csv(path) ``` actual results: >Tesla,"","" expected results: >Tesla,,"" ### Does this PR introduce _any_ user-facing change? Yes, if this bug has been fixed, the output of null values would been changed to nothing rather than quoted empty strings "". But, users can set nullValue to "\\"\\""(same as emptyValueInWrite's default value) to restore the previous behavior since 2.4. ### How was this patch tested? Adding a test case. Closes #34853 from wayneguow/SPARK-37575. Lead-authored-by: wayneguow Co-authored-by: Wayne Guo Signed-off-by: Max Gekk (cherry picked from commit 6a59fba248359fb2614837fe8781dc63ac8fdc4c) Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/csv/UnivocityGenerator.scala | 2 -- .../spark/sql/execution/datasources/csv/CSVSuite.scala | 13 - 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala index 2abf7bf..8504877 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala @@ -84,8 +84,6 @@ class UnivocityGenerator( while (i < row.numFields) { if (!row.isNullAt(i)) { values(i) = valueConverters(i).apply(row, i) - } else { -values(i) = options.nullValue } i += 1 } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala index 7efdf7c..a472221 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala @@ -804,6 +804,17 @@ abstract class CSVSuite } } + test("SPARK-37575: null values should be saved as nothing rather than " + +"quoted empty Strings \"\" with default settings") { +withTempPath { path => + Seq(("Tesla", null: String, "")) +.toDF("make", "comment", "blank") +.write +.csv(path.getCanonicalPath) + checkAnswer(spark.read.text(path.getCanonicalPath), Row("Tesla,,\"\"")) +} + } + test("save csv with compression codec option") { withTempDir { dir => val csvDir = new File(dir, "csv").getCanonicalPath @@ -1574,7 +1585,7 @@ a
[spark] branch master updated: [SPARK-37676][SQL] Support ANSI Aggregation Function: percentile_cont
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 73789da [SPARK-37676][SQL] Support ANSI Aggregation Function: percentile_cont 73789da is described below commit 73789da962c9037bde21a53fb5826b10475658fe Author: Jiaan Geng AuthorDate: Mon Dec 27 16:12:43 2021 +0300 [SPARK-37676][SQL] Support ANSI Aggregation Function: percentile_cont ### What changes were proposed in this pull request? `PERCENTILE_CONT` is an ANSI aggregate functions. The mainstream database supports `percentile_cont` show below: **Postgresql** https://www.postgresql.org/docs/9.4/functions-aggregate.html **Teradata** https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/cPkFySIBORL~M938Zv07Cg **Snowflake** https://docs.snowflake.com/en/sql-reference/functions/percentile_cont.html **Oracle** https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/PERCENTILE_CONT.html#GUID-CA259452-A565-41B3-A4F4-DD74B66CEDE0 **H2** http://www.h2database.com/html/functions-aggregate.html#percentile_cont **Sybase** https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc01776.1601/doc/html/san1278453109663.html **Exasol** https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/percentile_cont.htm **RedShift** https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html **Yellowbrick** https://www.yellowbrick.com/docs/2.2/ybd_sqlref/percentile_cont.html **Mariadb** https://mariadb.com/kb/en/percentile_cont/ **Phoenix** http://phoenix.incubator.apache.org/language/functions.html#percentile_cont **Singlestore** https://docs.singlestore.com/db/v7.6/en/reference/sql-reference/window-functions/percentile_cont-and-median.html ### Why are the changes needed? `PERCENTILE_CONT` is very useful. Exposing the expression can make the migration from other systems to Spark SQL easier. ### Does this PR introduce _any_ user-facing change? 'Yes'. New feature. ### How was this patch tested? New tests. Closes #34936 from beliefer/SPARK-37676. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- docs/sql-ref-ansi-compliance.md| 2 ++ .../apache/spark/sql/catalyst/parser/SqlBase.g4| 6 .../spark/sql/catalyst/parser/AstBuilder.scala | 15 +++- .../sql/catalyst/parser/PlanParserSuite.scala | 24 - .../test/resources/sql-tests/inputs/group-by.sql | 18 +- .../resources/sql-tests/results/group-by.sql.out | 41 +- 6 files changed, 102 insertions(+), 4 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index 7b5bde4..1b4a778 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -494,6 +494,7 @@ Below is a list of all the keywords in Spark SQL. |PARTITIONED|non-reserved|non-reserved|non-reserved| |PARTITIONS|non-reserved|non-reserved|non-reserved| |PERCENT|non-reserved|non-reserved|non-reserved| +|PERCENTILE_CONT|reserved|non-reserved|non-reserved| |PIVOT|non-reserved|non-reserved|non-reserved| |PLACING|non-reserved|non-reserved|non-reserved| |POSITION|non-reserved|non-reserved|reserved| @@ -594,5 +595,6 @@ Below is a list of all the keywords in Spark SQL. |WHERE|reserved|non-reserved|reserved| |WINDOW|non-reserved|non-reserved|reserved| |WITH|reserved|non-reserved|reserved| +|WITHIN|reserved|non-reserved|reserved| |YEAR|non-reserved|non-reserved|non-reserved| |ZONE|non-reserved|non-reserved|non-reserved| diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 6511489..5037520 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -888,6 +888,8 @@ primaryExpression FROM srcStr=valueExpression ')' #trim | OVERLAY '(' input=valueExpression PLACING replace=valueExpression FROM position=valueExpression (FOR length=valueExpression)? ')' #overlay +| PERCENTILE_CONT '(' percentage=valueExpression ')' + WITHIN GROUP '(' ORDER BY sortItem ')' #percentile ; constant @@ -1475,6 +1477,7 @@ nonReserved | PARTITION | PARTITIONED | PARTITIONS +| PERCENTILE_CONT | PERCENTLIT | PIVOT | PLACING @@ -1570,6 +1573,7 @@ nonReserved | WHERE | WINDOW | WITH +|
[spark] branch master updated: [SPARK-34755][SQL] Support the utils for transform number format
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a6576de [SPARK-34755][SQL] Support the utils for transform number format a6576de is described below commit a6576de9719204f6a87d2fc5e2e344bd1d0017a3 Author: Jiaan Geng AuthorDate: Wed Dec 29 11:07:06 2021 +0300 [SPARK-34755][SQL] Support the utils for transform number format ### What changes were proposed in this pull request? Data Type Formatting Functions: `to_number` and `to_char` is very useful. The implement has many different between `Postgresql` ,`Oracle` and `Phoenix`. So, this PR follows the implement of `to_number` in `Oracle` that give a strict parameter verification. So, this PR follows the implement of `to_number` in `Phoenix` that uses BigDecimal. This PR support the patterns for numeric formatting as follows: Pattern | Description -- | -- 9 | Value with the specified number of digits 0 | Value with leading zeros . (period) | Decimal point , (comma) | Group (thousand) separator S | Sign anchored to number (uses locale) $ | a value with a leading dollar sign D | Decimal point (uses locale) G | Group separator (uses locale) There are some mainstream database support the syntax. **PostgreSQL:** https://www.postgresql.org/docs/12/functions-formatting.html **Oracle:** https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html#GUID-D4807212-AFD7-48A7-9AED-BEC3E8809866 **Vertica** https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TO_NUMBER.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CFormatting%20Functions%7C_7 **Redshift** https://docs.aws.amazon.com/redshift/latest/dg/r_TO_NUMBER.html **DB2** https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1544.htm **Teradata** https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/TH2cDXBn6tala29S536nqg **Snowflake:** https://docs.snowflake.net/manuals/sql-reference/functions/to_decimal.html **Exasol** https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/to_number.htm#TO_NUMBER **Phoenix** http://phoenix.incubator.apache.org/language/functions.html#to_number **Singlestore** https://docs.singlestore.com/v7.3/reference/sql-reference/numeric-functions/to-number/ **Intersystems** https://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_TONUMBER Note: Based on discussion offline with cloud-fan ten months ago, this PR only implement the utils for transform number format. Because the utils should be review better. ### Why are the changes needed? `to_number` and `to_char` are very useful for formatted currency to number conversion. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Jenkins test Closes #31847 from beliefer/SPARK-34755. Lead-authored-by: Jiaan Geng Co-authored-by: gengjiaan Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/NumberUtils.scala | 189 .../spark/sql/errors/QueryCompilationErrors.scala | 8 + .../spark/sql/errors/QueryExecutionErrors.scala| 6 + .../spark/sql/catalyst/util/NumberUtilsSuite.scala | 317 + 4 files changed, 520 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala new file mode 100644 index 000..6efde2a --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import java.math.BigDecimal +import java.text.{DecimalFormat, NumberFormat, Pa
[spark] branch master updated (4d5ea5e -> 8fef5bb)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4d5ea5e [SPARK-37153][PYTHON] Inline type hints for python/pyspark/profiler.py add 8fef5bb [SPARK-37979][SQL] Switch to more generic error classes in AES functions No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 19 - .../spark/sql/errors/QueryExecutionErrors.scala| 19 +++-- .../sql/errors/QueryExecutionErrorsSuite.scala | 48 -- 3 files changed, 50 insertions(+), 36 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37937][SQL] Use error classes in the parsing errors of lateral join
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6347857 [SPARK-37937][SQL] Use error classes in the parsing errors of lateral join 6347857 is described below commit 6347857f0bad105541971283f79281c490f6bb18 Author: Terry Kim AuthorDate: Thu Feb 3 14:56:11 2022 +0300 [SPARK-37937][SQL] Use error classes in the parsing errors of lateral join ### What changes were proposed in this pull request? In the PR, I propose to use the following error classes for the parsing errors of lateral joins: - `INVALID_SQL_SYNTAX ` - `UNSUPPORTED_FEATURE ` These new error classes are added to `error-classes.json`. ### Why are the changes needed? Porting the parsing errors for lateral join to the new error framework should improve user experience with Spark SQL. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added new test suite Closes #35328 from imback82/SPARK-37937. Authored-by: Terry Kim Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 4 ++ .../spark/sql/errors/QueryParsingErrors.scala | 8 +-- .../sql/catalyst/parser/ErrorParserSuite.scala | 10 --- .../sql-tests/results/join-lateral.sql.out | 4 +- .../spark/sql/errors/QueryParsingErrorsSuite.scala | 81 ++ 5 files changed, 91 insertions(+), 16 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index a1ac99f..06ce22a 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -93,6 +93,10 @@ "message" : [ "The value of parameter(s) '%s' in %s is invalid: %s" ], "sqlState" : "22023" }, + "INVALID_SQL_SYNTAX" : { +"message" : [ "Invalid SQL syntax: %s" ], +"sqlState" : "42000" + }, "MAP_KEY_DOES_NOT_EXIST" : { "message" : [ "Key %s does not exist. If necessary set %s to false to bypass this error." ] }, diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala index 938bbfd..6bcd20c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala @@ -102,19 +102,19 @@ object QueryParsingErrors { } def lateralJoinWithNaturalJoinUnsupportedError(ctx: ParserRuleContext): Throwable = { -new ParseException("LATERAL join with NATURAL join is not supported", ctx) +new ParseException("UNSUPPORTED_FEATURE", Array("LATERAL join with NATURAL join."), ctx) } def lateralJoinWithUsingJoinUnsupportedError(ctx: ParserRuleContext): Throwable = { -new ParseException("LATERAL join with USING join is not supported", ctx) +new ParseException("UNSUPPORTED_FEATURE", Array("LATERAL join with USING join."), ctx) } def unsupportedLateralJoinTypeError(ctx: ParserRuleContext, joinType: String): Throwable = { -new ParseException(s"Unsupported LATERAL join type $joinType", ctx) +new ParseException("UNSUPPORTED_FEATURE", Array(s"LATERAL join type '$joinType'."), ctx) } def invalidLateralJoinRelationError(ctx: RelationPrimaryContext): Throwable = { -new ParseException(s"LATERAL can only be used with subquery", ctx) +new ParseException("INVALID_SQL_SYNTAX", Array("LATERAL can only be used with subquery."), ctx) } def repetitiveWindowDefinitionError(name: String, ctx: WindowClauseContext): Throwable = { diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala index dfc5edc..99051d6 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala @@ -208,14 +208,4 @@ class ErrorParserSuite extends AnalysisTest { |SELECT b """.stripMargin, 2, 9, 10, msg + " test-table") } - - test("SPARK-35789: lateral join with non-subquery relations") { -val msg = "LATERAL can only be used with subquery" -intercept("SELECT * FROM t1, LATERAL t2", msg) -intercept("SELECT * FROM t1 JOIN LATERAL t2",
[spark] branch master updated (6347857 -> b63a577)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6347857 [SPARK-37937][SQL] Use error classes in the parsing errors of lateral join add b63a577 [SPARK-37941][SQL] Use error classes in the compilation errors of casting No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 3 + .../spark/sql/errors/QueryCompilationErrors.scala | 25 +++ .../sql/errors/QueryCompilationErrorsSuite.scala | 80 ++ 3 files changed, 96 insertions(+), 12 deletions(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38105][SQL] Use error classes in the parsing errors of joins
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0d56c94 [SPARK-38105][SQL] Use error classes in the parsing errors of joins 0d56c94 is described below commit 0d56c947f10f747ab4b76426b2d6a34a1d3b8277 Author: Tengfei Huang AuthorDate: Sun Feb 6 21:19:29 2022 +0300 [SPARK-38105][SQL] Use error classes in the parsing errors of joins ### What changes were proposed in this pull request? Migrate the following errors in QueryParsingErrors onto use error classes: 1. joinCriteriaUnimplementedError => throw IllegalStateException instead, since it should never happen and not visible to users, introduced by improving exhaustivity in [PR](https://github.com/apache/spark/pull/30455) 2. naturalCrossJoinUnsupportedError => UNSUPPORTED_FEATURE ### Why are the changes needed? Porting join parsing errors to new error framework. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT added. Closes #35405 from ivoson/SPARK-38105. Authored-by: Tengfei Huang Signed-off-by: Max Gekk --- .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 2 +- .../scala/org/apache/spark/sql/errors/QueryParsingErrors.scala| 6 +- .../org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala | 8 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index ed2623e..bd43cff 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -1146,7 +1146,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg case Some(c) if c.booleanExpression != null => (baseJoinType, Option(expression(c.booleanExpression))) case Some(c) => -throw QueryParsingErrors.joinCriteriaUnimplementedError(c, ctx) +throw new IllegalStateException(s"Unimplemented joinCriteria: $c") case None if join.NATURAL != null => if (join.LATERAL != null) { throw QueryParsingErrors.lateralJoinWithNaturalJoinUnsupportedError(ctx) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala index 6bcd20c..6d7ed7b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala @@ -129,12 +129,8 @@ object QueryParsingErrors { new ParseException(s"Cannot resolve window reference '$name'", ctx) } - def joinCriteriaUnimplementedError(join: JoinCriteriaContext, ctx: RelationContext): Throwable = { -new ParseException(s"Unimplemented joinCriteria: $join", ctx) - } - def naturalCrossJoinUnsupportedError(ctx: RelationContext): Throwable = { -new ParseException("NATURAL CROSS JOIN is not supported", ctx) +new ParseException("UNSUPPORTED_FEATURE", Array("NATURAL CROSS JOIN."), ctx) } def emptyInputForTableSampleError(ctx: ParserRuleContext): Throwable = { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala index 1a213bf..03117b9 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala @@ -78,4 +78,12 @@ class QueryParsingErrorsSuite extends QueryTest with SharedSparkSession { message = "Invalid SQL syntax: LATERAL can only be used with subquery.") } } + + test("UNSUPPORTED_FEATURE: NATURAL CROSS JOIN is not supported") { +validateParsingError( + sqlText = "SELECT * FROM a NATURAL CROSS JOIN b", + errorClass = "UNSUPPORTED_FEATURE", + sqlState = "0A000", + message = "The feature is not supported: NATURAL CROSS JOIN.") + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f62b36c -> 65c0bdf)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f62b36c [SPARK-38128][PYTHON][TESTS] Show full stacktrace in tests by default in PySpark tests add 65c0bdf [SPARK-38126][SQL][TESTS] Check the whole message of error classes No new revisions were added by this update. Summary of changes: .../sql/errors/QueryCompilationErrorsSuite.scala | 9 ++- .../sql/errors/QueryExecutionErrorsSuite.scala | 12 ++-- .../spark/sql/errors/QueryParsingErrorsSuite.scala | 64 +- 3 files changed, 62 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2e703ae -> 08c851d)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2e703ae [SPARK-38030][SQL] Canonicalization should not remove nullability of AttributeReference dataType add 08c851d [SPARK-37943][SQL] Use error classes in the compilation errors of grouping No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 3 ++ .../spark/sql/errors/QueryCompilationErrors.scala | 4 ++- .../sql/errors/QueryCompilationErrorsSuite.scala | 35 ++ 3 files changed, 41 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5f0a92c -> 7688d839)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5f0a92c [SPARK-38157][SQL] Explicitly set ANSI to false in test timestampNTZ/timestamp.sql and SQLQueryTestSuite to match the expected golden results add 7688d839 [SPARK-38113][SQL] Use error classes in the execution errors of pivoting No new revisions were added by this update. Summary of changes: .../spark/sql/errors/QueryExecutionErrors.scala| 8 +-- .../sql/errors/QueryExecutionErrorsSuite.scala | 27 +- 2 files changed, 32 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7688d839 -> 53ba6e2)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7688d839 [SPARK-38113][SQL] Use error classes in the execution errors of pivoting add 53ba6e2 [SPARK-38131][SQL] Use error classes in user-facing exceptions only No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json| 4 .../spark/sql/catalyst/analysis/Analyzer.scala | 3 ++- .../sql/catalyst/expressions/csvExpressions.scala | 2 +- .../spark/sql/errors/QueryCompilationErrors.scala | 8 .../spark/sql/errors/QueryExecutionErrors.scala | 5 - .../sql/errors/QueryCompilationErrorsSuite.scala| 21 + 6 files changed, 4 insertions(+), 39 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (17653fb -> 3d285c1)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 17653fb [SPARK-37401][PYTHON][ML] Inline typehints for pyspark.ml.clustering add 3d285c1 [SPARK-38123][SQL] Unified use `DataType` as `targetType` of `QueryExecutionErrors#castingCauseOverflowError` No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/Cast.scala | 60 -- .../spark/sql/catalyst/util/IntervalUtils.scala| 18 +++ .../spark/sql/errors/QueryExecutionErrors.scala| 4 +- .../scala/org/apache/spark/sql/types/Decimal.scala | 14 ++--- .../org/apache/spark/sql/types/numerics.scala | 10 ++-- .../sql-tests/results/postgreSQL/float4.sql.out| 2 +- .../sql-tests/results/postgreSQL/float8.sql.out| 2 +- .../sql-tests/results/postgreSQL/int8.sql.out | 2 +- 8 files changed, 59 insertions(+), 53 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38198][SQL] Fix `QueryExecution.debug#toFile` use the passed in `maxFields` when `explainMode` is `CodegenMode`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ff92e85 [SPARK-38198][SQL] Fix `QueryExecution.debug#toFile` use the passed in `maxFields` when `explainMode` is `CodegenMode` ff92e85 is described below commit ff92e85f86d3e36428996695001a23893d406b76 Author: yangjie01 AuthorDate: Mon Feb 14 13:28:11 2022 +0300 [SPARK-38198][SQL] Fix `QueryExecution.debug#toFile` use the passed in `maxFields` when `explainMode` is `CodegenMode` ### What changes were proposed in this pull request? `QueryExecution.debug#toFile` method supports passing in `maxFields` and this parameter will be passed down when `explainMode` is `SimpleMode`, `ExtendedMode`, or `CostMode`. But the passed down `maxFields` was ignored when `explainMode` is `CostMode` because `QueryExecution#stringWithStats` overrides it with `SQLConf.get.maxToStringFields` at present, so this pr removes the override behavior to let passed in `maxFields` take effect. ### Why are the changes needed? Bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA and add a new test case Closes #35506 from LuciferYang/SPARK-38198. Authored-by: yangjie01 Signed-off-by: Max Gekk --- .../apache/spark/sql/execution/QueryExecution.scala| 2 -- .../spark/sql/execution/QueryExecutionSuite.scala | 18 ++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala index 26c6904..1b08994 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala @@ -304,8 +304,6 @@ class QueryExecution( } private def stringWithStats(maxFields: Int, append: String => Unit): Unit = { -val maxFields = SQLConf.get.maxToStringFields - // trigger to compute stats for logical plans try { // This will trigger to compute stats for all the nodes in the plan, including subqueries, diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala index ecc448f..2c58b53 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala @@ -261,4 +261,22 @@ class QueryExecutionSuite extends SharedSparkSession { val cmdResultExec = projectQe.executedPlan.asInstanceOf[CommandResultExec] assert(cmdResultExec.commandPhysicalPlan.isInstanceOf[ShowTablesExec]) } + + test("SPARK-38198: check specify maxFields when call toFile method") { +withTempDir { dir => + val path = dir.getCanonicalPath + "/plans.txt" + // Define a dataset with 6 columns + val ds = spark.createDataset(Seq((0, 1, 2, 3, 4, 5), (6, 7, 8, 9, 10, 11))) + // `CodegenMode` and `FormattedMode` doesn't use the maxFields, so not tested in this case + Seq(SimpleMode.name, ExtendedMode.name, CostMode.name).foreach { modeName => +val maxFields = 3 +ds.queryExecution.debug.toFile(path, explainMode = Some(modeName), maxFields = maxFields) +Utils.tryWithResource(Source.fromFile(path)) { source => + val tableScan = source.getLines().filter(_.contains("LocalTableScan")) + assert(tableScan.exists(_.contains("more fields")), +s"Specify maxFields = $maxFields doesn't take effect when explainMode is $modeName") +} + } +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ff92e85 -> c8b34ab)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ff92e85 [SPARK-38198][SQL] Fix `QueryExecution.debug#toFile` use the passed in `maxFields` when `explainMode` is `CodegenMode` add c8b34ab [SPARK-38097][SQL][TESTS] Improved the error message for pivoting unsupported column No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 8 .../scala/org/apache/spark/sql/RelationalGroupedDataset.scala| 9 - .../org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 7 +++ 3 files changed, 19 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated (75c7726 -> 940ac0c)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git. from 75c7726 [SPARK-37498][PYTHON] Add eventually for test_reuse_worker_of_parallelize_range add 940ac0c [SPARK-38198][SQL][3.2] Fix QueryExecution.debug#toFile use the passed in maxFields when explainMode is CodegenMode No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/QueryExecution.scala| 2 -- .../spark/sql/execution/QueryExecutionSuite.scala | 18 ++ 2 files changed, 18 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ea1f922 -> a9a792b3)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ea1f922 [SPARK-37707][SQL][FOLLOWUP] Allow implicitly casting Date type to AnyTimestampType under ANSI mode add a9a792b3 [SPARK-38199][SQL] Delete the unused `dataType` specified in the definition of `IntervalColumnAccessor` No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/columnar/ColumnAccessor.scala | 2 +- .../apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (837248a -> 3a7eafd)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 837248a [MINOR][DOC] Fix documentation for structured streaming - addListener add 3a7eafd [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md| 1 + .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 ++ .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../catalyst/expressions/datetimeExpressions.scala | 84 ++ .../spark/sql/catalyst/parser/AstBuilder.scala | 11 +++ .../spark/sql/catalyst/util/DateTimeUtils.scala| 36 ++ .../spark/sql/errors/QueryExecutionErrors.scala| 6 ++ .../expressions/DateExpressionsSuite.scala | 62 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 36 +- .../sql-functions/sql-expression-schema.md | 3 +- .../test/resources/sql-tests/inputs/timestamp.sql | 6 ++ .../sql-tests/results/ansi/timestamp.sql.out | 34 - .../sql-tests/results/datetime-legacy.sql.out | 34 - .../resources/sql-tests/results/timestamp.sql.out | 34 - .../results/timestampNTZ/timestamp-ansi.sql.out| 34 - .../results/timestampNTZ/timestamp.sql.out | 34 - .../sql/errors/QueryExecutionErrorsSuite.scala | 12 +++- 17 files changed, 424 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org