(spark-website) branch asf-site updated: add a behavior change guideline (#518)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 93d08b7bff add a behavior change guideline (#518) 93d08b7bff is described below commit 93d08b7bff0875917e59a0158ed1daf794ddff99 Author: Wenchen Fan AuthorDate: Sat Jun 8 07:12:35 2024 +0800 add a behavior change guideline (#518) * behavior change guide * Apply suggestions from code review Co-authored-by: Niranjan * address comments * address comments - Co-authored-by: Niranjan --- contributing.md| 31 +++ site/contributing.html | 33 + 2 files changed, 64 insertions(+) diff --git a/contributing.md b/contributing.md index 8f0ec49869..06f5fdf5b9 100644 --- a/contributing.md +++ b/contributing.md @@ -209,6 +209,37 @@ When writing error messages, you should: See the error message guidelines for more details. +Behavior changes + +Behavior changes are user-visible functional changes in a new release via public APIs. The term 'user' here refers +not only to those who write queries and/or develop Spark plugins, but also to those who deploy and/or manage Spark +clusters. New features and bug fixes, such as correcting query results or schemas and failing unsupported queries +that previously returned incorrect results, are considered behavior changes. However, performance improvements, +code refactoring, and changes to unreleased APIs/features are not. + +Everyone makes mistakes, including Spark developers. We will continue to fix defects in Spark as they arise. +However, it is important to communicate these behavior changes so that Spark users can be prepared for version +upgrades. If a PR introduces behavior changes, it should be explicitly mentioned in the PR description. If the +behavior change may require additional user actions, this should be highlighted in the migration guide +(docs/sql-migration-guide.md for the SQL component and similar files for other components). Where possible, +provide options to restore the previous behavior and mention these options in the error message. Some examples include: + +- Bug fixes that change query results. Users may need to backfill to correct existing data and must be informed about +these correctness fixes. +- Bug fixes that change the query schema. Users may need to update the schema of tables in their data pipelines and must +be informed about these changes. +- Removing or renaming Spark configurations. +- Renaming error classes or conditions. +- Any non-additive changes to the public Python/SQL/Scala/Java/R APIs (including developer APIs), such as renaming +functions, removing parameters, adding parameters, renaming parameters, or changing parameter default values. These +changes should generally be avoided, or if necessary, done in a binary-compatible manner by deprecating the old function +and introducing a new one instead. +- Any non-additive changes to the way Spark should be deployed and managed: renaming argument names in deployment scripts, +updates to the REST API, changes to the method of loading configuration files, etc. + +This list is not meant to be comprehensive. Anyone reviewing a PR can ask the PR author to add to the migration guide +if they believe the change is risky and may disrupt users during an upgrade. + Code review criteria Before considering how to contribute code, it's useful to understand how code is reviewed, diff --git a/site/contributing.html b/site/contributing.html index 47d4d8d662..aeaeeceb82 100644 --- a/site/contributing.html +++ b/site/contributing.html @@ -362,6 +362,39 @@ error messages. See the error message guidelines for more details. +Behavior changes + +Behavior changes are user-visible functional changes in a new release via public APIs. The term user here refers +not only to those who write queries and/or develop Spark plugins, but also to those who deploy and/or manage Spark +clusters. New features and bug fixes, such as correcting query results or schemas and failing unsupported queries +that previously returned incorrect results, are considered behavior changes. However, performance improvements, +code refactoring, and changes to unreleased APIs/features are not. + +Everyone makes mistakes, including Spark developers. We will continue to fix defects in Spark as they arise. +However, it is important to communicate these behavior changes so that Spark users can be prepared for version +upgrades. If a PR introduces behavior changes, it should be explicitly mentioned in the PR description. If the +behavior change may require additional user actions, this should be highlighted in the migration guide +(docs/sql-migration-guide.md for the SQL component and similar files
(spark) branch master updated (d81b1e3d358c -> 8911d59005e8)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d81b1e3d358c [SPARK-48559][SQL] Fetch globalTempDatabase name directly without invoking initialization of GlobalaTempViewManager add 8911d59005e8 [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-conditions.json | 5 +++ .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 36 ++ .../datasources/v2/jdbc/JDBCTableCatalog.scala | 13 +--- 3 files changed, 30 insertions(+), 24 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (87b0f5995383 -> d81b1e3d358c)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 87b0f5995383 [SPARK-48561][PS][CONNECT] Throw `PandasNotImplementedError` for unsupported plotting functions add d81b1e3d358c [SPARK-48559][SQL] Fetch globalTempDatabase name directly without invoking initialization of GlobalaTempViewManager No new revisions were added by this update. Summary of changes: .../catalyst/catalog/GlobalTempViewManager.scala | 2 +- .../sql/catalyst/catalog/SessionCatalog.scala | 35 +++--- .../org/apache/spark/sql/internal/SQLConf.scala| 2 ++ .../sql/catalyst/catalog/SessionCatalogSuite.scala | 20 ++--- .../execution/command/AnalyzeColumnCommand.scala | 2 +- .../apache/spark/sql/internal/SharedState.scala| 3 +- .../org/apache/spark/sql/CachedTableSuite.scala| 8 ++--- .../spark/sql/StatisticsCollectionSuite.scala | 2 +- .../spark/sql/execution/GlobalTempViewSuite.scala | 2 +- .../apache/spark/sql/execution/SQLViewSuite.scala | 6 ++-- .../spark/sql/execution/SQLViewTestSuite.scala | 2 +- .../command/AlterTableDropPartitionSuiteBase.scala | 2 +- .../AlterTableRenamePartitionSuiteBase.scala | 2 +- .../spark/sql/execution/command/DDLSuite.scala | 2 +- .../execution/command/TruncateTableSuiteBase.scala | 4 +-- .../command/v1/AlterTableAddPartitionSuite.scala | 2 +- .../command/v2/AlterTableAddPartitionSuite.scala | 2 +- .../thriftserver/SparkGetColumnsOperation.scala| 2 +- .../thriftserver/SparkGetSchemasOperation.scala| 2 +- .../thriftserver/SparkGetTablesOperation.scala | 2 +- .../ThriftServerWithSparkContextSuite.scala| 2 +- .../spark/sql/hive/HiveSharedStateSuite.scala | 2 +- 22 files changed, 56 insertions(+), 52 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new a00c11546273 [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error a00c11546273 is described below commit a00c11546273089dbfa993fa4c170eb70beecbc3 Author: Uros Stankovic AuthorDate: Thu Jun 6 13:08:48 2024 -0700 [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error FIRST CHANGE Pass correct parameter list to `org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` when it is invoked from `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column`. `org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` method accepts 3 parameter 1) Field to analyze 2) Statement type - String 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT Method `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column` pass `fieldToAnalyze` and `EXISTS_DEFAULT` as second parameter, so it is not metadata key, instead of that, it is statement type, so different expression is analyzed. Pull requests where original change was introduced https://github.com/apache/spark/pull/40049 - Initial commit https://github.com/apache/spark/pull/44876 - Refactor that did not touch the issue https://github.com/apache/spark/pull/44935 - Another refactor that did not touch the issue SECOND CHANGE Add user facing exception when default value is not foldable or resolved. Otherwise, user would see message "You hit a bug in Spark ...". It is needed to pass correct value to `Column` object Yes, this is a bug fix, existence default value has now proper expression, but before this change, existence default value was actually current default value of column. Unit test No Closes #46594 from urosstan-db/SPARK-48286-Analyze-exists-default-expression-instead-of-current-default-expression. Lead-authored-by: Uros Stankovic Co-authored-by: Uros Stankovic <155642965+urosstan...@users.noreply.github.com> Signed-off-by: Wenchen Fan (cherry picked from commit 0f21df0b29cc18f0e0c7b12543f3a037e4032e65) Signed-off-by: Wenchen Fan --- .../catalyst/util/ResolveDefaultColumnsUtil.scala | 16 +++ .../sql/connector/catalog/CatalogV2Util.scala | 7 ++- .../DataSourceV2DataFrameSessionCatalogSuite.scala | 9 +++- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 24 ++ 4 files changed, 54 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala index 50ff3eeab0c1..f55fa2d8f5e8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala @@ -279,6 +279,7 @@ object ResolveDefaultColumns extends QueryErrorsBase with ResolveDefaultColumnsU throw QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions( statementType, colName, defaultSQL) } + // Analyze the parse result. val plan = try { val analyzer: Analyzer = DefaultColumnAnalyzer @@ -293,6 +294,21 @@ object ResolveDefaultColumns extends QueryErrorsBase with ResolveDefaultColumnsU val analyzed: Expression = plan.collectFirst { case Project(Seq(a: Alias), OneRowRelation()) => a.child }.get + +if (!analyzed.foldable) { + throw QueryCompilationErrors.defaultValueNotConstantError(statementType, colName, defaultSQL) +} + +// Another extra check, expressions should already be resolved if AnalysisException is not +// thrown in the code block above +if (!analyzed.resolved) { + throw QueryCompilationErrors.defaultValuesUnresolvedExprError( +statementType, +colName, +defaultSQL, +cause = null) +} + // Perform implicit coercion from the provided expression type to the required column type. if (dataType == analyzed.dataType) { analyzed diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala index be569b1de9db..47c438f154ab 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala @@ -512,10 +512,15 @@ private[sql] object CatalogV2Util { } if (isDefaultCol
(spark) branch master updated: [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0f21df0b29cc [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error 0f21df0b29cc is described below commit 0f21df0b29cc18f0e0c7b12543f3a037e4032e65 Author: Uros Stankovic AuthorDate: Thu Jun 6 13:08:48 2024 -0700 [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error ### What changes were proposed in this pull request? FIRST CHANGE Pass correct parameter list to `org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` when it is invoked from `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column`. `org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` method accepts 3 parameter 1) Field to analyze 2) Statement type - String 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT Method `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column` pass `fieldToAnalyze` and `EXISTS_DEFAULT` as second parameter, so it is not metadata key, instead of that, it is statement type, so different expression is analyzed. Pull requests where original change was introduced https://github.com/apache/spark/pull/40049 - Initial commit https://github.com/apache/spark/pull/44876 - Refactor that did not touch the issue https://github.com/apache/spark/pull/44935 - Another refactor that did not touch the issue SECOND CHANGE Add user facing exception when default value is not foldable or resolved. Otherwise, user would see message "You hit a bug in Spark ...". ### Why are the changes needed? It is needed to pass correct value to `Column` object ### Does this PR introduce _any_ user-facing change? Yes, this is a bug fix, existence default value has now proper expression, but before this change, existence default value was actually current default value of column. ### How was this patch tested? Unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #46594 from urosstan-db/SPARK-48286-Analyze-exists-default-expression-instead-of-current-default-expression. Lead-authored-by: Uros Stankovic Co-authored-by: Uros Stankovic <155642965+urosstan...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../catalyst/util/ResolveDefaultColumnsUtil.scala | 16 +++ .../sql/connector/catalog/CatalogV2Util.scala | 7 ++- .../DataSourceV2DataFrameSessionCatalogSuite.scala | 9 +++- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 24 ++ 4 files changed, 54 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala index d73e2ca6bd9d..ad104b6e0c76 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala @@ -284,6 +284,7 @@ object ResolveDefaultColumns extends QueryErrorsBase throw QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions( statementType, colName, defaultSQL) } + // Analyze the parse result. val plan = try { val analyzer: Analyzer = DefaultColumnAnalyzer @@ -298,6 +299,21 @@ object ResolveDefaultColumns extends QueryErrorsBase val analyzed: Expression = plan.collectFirst { case Project(Seq(a: Alias), OneRowRelation()) => a.child }.get + +if (!analyzed.foldable) { + throw QueryCompilationErrors.defaultValueNotConstantError(statementType, colName, defaultSQL) +} + +// Another extra check, expressions should already be resolved if AnalysisException is not +// thrown in the code block above +if (!analyzed.resolved) { + throw QueryCompilationErrors.defaultValuesUnresolvedExprError( +statementType, +colName, +defaultSQL, +cause = null) +} + // Perform implicit coercion from the provided expression type to the required column type. coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala index 5485f5255b6e..f36310e8ad89 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sq
(spark) branch master updated: [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 84fa0527834b [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE 84fa0527834b is described below commit 84fa0527834b947ad12e4a6398512c75929cc99b Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Thu Jun 6 12:05:28 2024 -0700 [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE ### What changes were proposed in this pull request? String comparison and hashing in UTF8_BINARY_LCASE is now context-unaware, and uses ICU root locale rules to convert string to lowercase at code point level, taking into consideration special cases for one-to-many case mapping. For example: comparing "ΘΑΛΑΣΣΙΝΟΣ" and "θαλασσινοσ" under UTF8_BINARY_LCASE now returns true, because Greek final sigma is special-cased in the new comparison implementation. ### Why are the changes needed? 1. UTF8_BINARY_LCASE should use ICU root locale rules (instead of JVM) 2. comparing strings under UTF8_BINARY_LCASE should be context-insensitive ### Does this PR introduce _any_ user-facing change? Yes, comparing strings under UTF8_BINARY_LCASE will now give different results in two kinds of special cases (Turkish dotted letter "i" and Greek final letter "sigma"). ### How was this patch tested? Unit tests in `CollationSupportSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46700 from uros-db/lcase-casing. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../catalyst/util/CollationAwareUTF8String.java| 90 .../spark/sql/catalyst/util/CollationFactory.java | 4 +- .../org/apache/spark/unsafe/types/UTF8String.java | 30 +--- .../spark/unsafe/types/CollationSupportSuite.java | 151 + .../apache/spark/unsafe/types/UTF8StringSuite.java | 23 5 files changed, 244 insertions(+), 54 deletions(-) diff --git a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java index cf3b5c86dcf6..056b202bc398 100644 --- a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java +++ b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java @@ -183,6 +183,54 @@ public class CollationAwareUTF8String { return MATCH_NOT_FOUND; } + /** + * Lowercase UTF8String comparison used for UTF8_BINARY_LCASE collation. While the default + * UTF8String comparison is equivalent to a.toLowerCase().binaryCompare(b.toLowerCase()), this + * method uses code points to compare the strings in a case-insensitive manner using ICU rules, + * as well as handling special rules for one-to-many case mappings (see: lowerCaseCodePoints). + * + * @param left The first UTF8String to compare. + * @param right The second UTF8String to compare. + * @return An integer representing the comparison result. + */ + public static int compareLowerCase(final UTF8String left, final UTF8String right) { +// Only if both strings are ASCII, we can use faster comparison (no string allocations). +if (left.isFullAscii() && right.isFullAscii()) { + return compareLowerCaseAscii(left, right); +} +return compareLowerCaseSlow(left, right); + } + + /** + * Fast version of the `compareLowerCase` method, used when both arguments are ASCII strings. + * + * @param left The first ASCII UTF8String to compare. + * @param right The second ASCII UTF8String to compare. + * @return An integer representing the comparison result. + */ + private static int compareLowerCaseAscii(final UTF8String left, final UTF8String right) { +int leftBytes = left.numBytes(), rightBytes = right.numBytes(); +for (int curr = 0; curr < leftBytes && curr < rightBytes; curr++) { + int lowerLeftByte = Character.toLowerCase(left.getByte(curr)); + int lowerRightByte = Character.toLowerCase(right.getByte(curr)); + if (lowerLeftByte != lowerRightByte) { +return lowerLeftByte - lowerRightByte; + } +} +return leftBytes - rightBytes; + } + + /** + * Slow version of the `compareLowerCase` method, used when both arguments are non-ASCII strings. + * + * @param left The first non-ASCII UTF8String to compare. + * @param right The second non-ASCII UTF8String to compare. + * @return An integer representing the comparison result. + */ + private static int compareLowerCaseSlow(final UTF8String left, final UTF8S
(spark) branch master updated (3878b57e6e88 -> b5a4b3200362)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3878b57e6e88 [SPARK-48526][SS] Allow passing custom sink to testStream() add b5a4b3200362 [SPARK-48435][SQL] UNICODE collation should not support binary equality No new revisions were added by this update. Summary of changes: .../catalyst/util/CollationAwareUTF8String.java| 5 +- .../spark/sql/catalyst/util/CollationFactory.java | 2 +- .../spark/unsafe/types/CollationSupportSuite.java | 36 +-- .../spark/unsafe/types/CollationFactorySuite.scala | 10 ++- .../expressions/CollationExpressionSuite.scala | 8 +-- .../CollationRegexpExpressionsSuite.scala | 71 +++--- .../apache/spark/sql/CollationSQLRegexpSuite.scala | 31 +- .../sql/CollationStringExpressionsSuite.scala | 32 +- .../streaming/StreamingDeduplicationSuite.scala| 2 +- 9 files changed, 86 insertions(+), 111 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48526][SS] Allow passing custom sink to testStream()
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3878b57e6e88 [SPARK-48526][SS] Allow passing custom sink to testStream() 3878b57e6e88 is described below commit 3878b57e6e88631826c1c8690eb9052e5efa5aa1 Author: Johan Lasperas AuthorDate: Thu Jun 6 11:19:53 2024 -0700 [SPARK-48526][SS] Allow passing custom sink to testStream() ### What changes were proposed in this pull request? Update `StreamTest:testStream()` to allow passing a custom sink. This allows writing better tests covering streaming sinks, in particular: - reusing a sink across calls to testStream. - passing a custom sink implementation. ### Why are the changes needed? Better testing infrastructure. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #46866 from johanl-db/allow-passing-custom-sink-stream-test. Authored-by: Johan Lasperas Signed-off-by: Wenchen Fan --- .../src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala index d7401897ff6a..7439c7ab6d6e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala @@ -346,7 +346,8 @@ trait StreamTest extends QueryTest with SharedSparkSession with TimeLimits with def testStream( _stream: Dataset[_], outputMode: OutputMode = OutputMode.Append, - extraOptions: Map[String, String] = Map.empty)(actions: StreamAction*): Unit = synchronized { + extraOptions: Map[String, String] = Map.empty, + sink: MemorySink = new MemorySink())(actions: StreamAction*): Unit = synchronized { import org.apache.spark.sql.streaming.util.StreamManualClock // `synchronized` is added to prevent the user from calling multiple `testStream`s concurrently @@ -359,7 +360,6 @@ trait StreamTest extends QueryTest with SharedSparkSession with TimeLimits with var currentStream: StreamExecution = null var lastStream: StreamExecution = null val awaiting = new mutable.HashMap[Int, OffsetV2]() // source index -> offset to wait for -val sink = new MemorySink val resetConfValues = mutable.Map[String, Option[String]]() val defaultCheckpointLocation = Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (7cba1ab4d6ac -> 9f4007f3d89e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7cba1ab4d6ac [SPARK-48554][INFRA] Use R 4.4.0 in `windows` R GitHub Action Window job add 9f4007f3d89e [SPARK-48546][SQL] Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression No new revisions were added by this update. Summary of changes: .../sql/catalyst/encoders/ExpressionEncoder.scala | 5 +++ .../catalyst/encoders/EncoderResolutionSuite.scala | 15 +++- .../sql/catalyst/encoders/RowEncoderSuite.scala| 2 +- .../scala/org/apache/spark/sql/DatasetSuite.scala | 40 ++ 4 files changed, 29 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 966c3d9ef1ed [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric 966c3d9ef1ed is described below commit 966c3d9ef1edc8b2f7d53b8a592ff4e2a2f9b80b Author: Wenchen Fan AuthorDate: Wed Jun 5 20:49:03 2024 -0700 [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/45710 . Some custom `FileSystem` implementations read the `hadoop.fs.s3a.connection.establish.timeout` config as numeric, and do not support the `30s` syntax. To make it safe, this PR proposes to set this conf to `3` instead of `30s`. I checked the doc page and this config is milliseconds. ### Why are the changes needed? more compatible with custom `FileSystem` implementations. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manual ### Was this patch authored or co-authored using generative AI tooling? no Closes #46874 from cloud-fan/follow. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- core/src/main/scala/org/apache/spark/SparkContext.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 90d8cef00ef8..6eb2bea40bdb 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -421,7 +421,7 @@ class SparkContext(config: SparkConf) extends Logging { } // HADOOP-19097 Set fs.s3a.connection.establish.timeout to 30s // We can remove this after Apache Hadoop 3.4.1 releases -conf.setIfMissing("spark.hadoop.fs.s3a.connection.establish.timeout", "30s") +conf.setIfMissing("spark.hadoop.fs.s3a.connection.establish.timeout", "3") // This should be set as early as possible. SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48307][SQL][FOLLOWUP] Allow outer references in un-referenced CTE relations
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d5c33c6bfb57 [SPARK-48307][SQL][FOLLOWUP] Allow outer references in un-referenced CTE relations d5c33c6bfb57 is described below commit d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32 Author: Wenchen Fan AuthorDate: Wed Jun 5 14:38:44 2024 -0700 [SPARK-48307][SQL][FOLLOWUP] Allow outer references in un-referenced CTE relations ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/46617 . Subquery expression has a bunch of correlation checks which need to match certain plan shapes. We broke this by leaving `WithCTE` in the plan for un-referenced CTE relations. This PR fixes the issue by skipping CTE plan nodes in correlated subquery expression checks. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? no bug is not released yet. ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46869 from cloud-fan/check. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- .../sql/catalyst/analysis/CheckAnalysis.scala | 7 + .../plans/logical/basicLogicalOperators.scala | 4 +++ .../sql-tests/analyzer-results/cte-legacy.sql.out | 24 +++ .../sql-tests/analyzer-results/cte-nested.sql.out | 34 ++ .../analyzer-results/cte-nonlegacy.sql.out | 34 ++ .../test/resources/sql-tests/inputs/cte-nested.sql | 12 .../resources/sql-tests/results/cte-legacy.sql.out | 22 ++ .../resources/sql-tests/results/cte-nested.sql.out | 22 ++ .../sql-tests/results/cte-nonlegacy.sql.out| 22 ++ 9 files changed, 181 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 8c380a7228c6..f4408220ac93 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -1371,6 +1371,13 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB aggregated, canContainOuter && SQLConf.get.getConf(SQLConf.DECORRELATE_OFFSET_ENABLED)) +// We always inline CTE relations before analysis check, and only un-referenced CTE +// relations will be kept in the plan. Here we should simply skip them and check the +// children, as un-referenced CTE relations won't be executed anyway and doesn't need to +// be restricted by the current subquery correlation limitations. +case _: WithCTE | _: CTERelationDef => + plan.children.foreach(p => checkPlan(p, aggregated, canContainOuter)) + // Category 4: Any other operators not in the above 3 categories // cannot be on a correlation path, that is they are allowed only // under a correlation point but they and their descendant operators diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index 9242a06cf1d6..0135fcfb3cc8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -911,6 +911,10 @@ case class WithCTE(plan: LogicalPlan, cteDefs: Seq[CTERelationDef]) extends Logi def withNewPlan(newPlan: LogicalPlan): WithCTE = { withNewChildren(children.init :+ newPlan).asInstanceOf[WithCTE] } + + override def maxRows: Option[Long] = plan.maxRows + + override def maxRowsPerPartition: Option[Long] = plan.maxRowsPerPartition } /** diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/cte-legacy.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/cte-legacy.sql.out index 594a30b054ed..f9b78e94236f 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/cte-legacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/cte-legacy.sql.out @@ -43,6 +43,30 @@ Project [scalar-subquery#x [] AS scalarsubquery()#x] +- OneRowRelation +-- !query +SELECT ( + WITH unreferenced AS (SELECT id) + SELECT 1 +) FROM range(1) +-- !query analysis +Project [scalar-subquery#x [] AS scalarsubquery()#x] +: +- Project [1 AS 1#x] +: +- OneRowRelation ++- Range
(spark) branch master updated (34ac7de89711 -> 490a4b3b1fdf)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 34ac7de89711 [SPARK-48536][PYTHON][CONNECT] Cache user specified schema in applyInPandas and applyInArrow add 490a4b3b1fdf [SPARK-48498][SQL] Always do char padding in predicates No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 8 + .../datasources/ApplyCharTypePadding.scala | 39 -- .../apache/spark/sql/CharVarcharTestSuite.scala| 28 .../org/apache/spark/sql/PlanStabilitySuite.scala | 8 +++-- 4 files changed, 70 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8a0927c07a14 [SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node 8a0927c07a14 is described below commit 8a0927c07a1483bcd9125bdc2062a63759b0a337 Author: Wenchen Fan AuthorDate: Tue Jun 4 15:04:22 2024 -0700 [SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node ### What changes were proposed in this pull request? I noticed an outdated comment in the rule `InlineCTE` ``` // CTEs in SQL Commands have been inlined by `CTESubstitution` already, so it is safe to add // WithCTE as top node here. ``` This is not true anymore after https://github.com/apache/spark/pull/42036 . It's not a big deal as we replace not-inlined CTE relations with `Repartition` during optimization, so it doesn't matter where we put the `WithCTE` node with not-inlined CTE relations, as it will disappear eventually. But it's still better to keep it at its original place, as third-party rules may be sensitive about the plan shape. ### Why are the changes needed? to keep the plan shape as much as can after inlining CTE relations. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #46617 from cloud-fan/cte. Lead-authored-by: Wenchen Fan Co-authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- .../sql/catalyst/analysis/CheckAnalysis.scala | 45 +-- .../spark/sql/catalyst/optimizer/InlineCTE.scala | 133 + .../sql/catalyst/optimizer/InlineCTESuite.scala| 42 +++ 3 files changed, 132 insertions(+), 88 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 1c2baa78be1b..8c380a7228c6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -143,50 +143,17 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB errorClass, missingCol, orderedCandidates, a.origin) } - private def checkUnreferencedCTERelations( - cteMap: mutable.Map[Long, (CTERelationDef, Int, mutable.Map[Long, Int])], - visited: mutable.Map[Long, Boolean], - danglingCTERelations: mutable.ArrayBuffer[CTERelationDef], - cteId: Long): Unit = { -if (visited(cteId)) { - return -} -val (cteDef, _, refMap) = cteMap(cteId) -refMap.foreach { case (id, _) => - checkUnreferencedCTERelations(cteMap, visited, danglingCTERelations, id) -} -danglingCTERelations.append(cteDef) -visited(cteId) = true - } - def checkAnalysis(plan: LogicalPlan): Unit = { -val inlineCTE = InlineCTE(alwaysInline = true) -val cteMap = mutable.HashMap.empty[Long, (CTERelationDef, Int, mutable.Map[Long, Int])] -inlineCTE.buildCTEMap(plan, cteMap) -val danglingCTERelations = mutable.ArrayBuffer.empty[CTERelationDef] -val visited: mutable.Map[Long, Boolean] = mutable.Map.empty.withDefaultValue(false) -// If a CTE relation is never used, it will disappear after inline. Here we explicitly collect -// these dangling CTE relations, and put them back in the main query, to make sure the entire -// query plan is valid. -cteMap.foreach { case (cteId, (_, refCount, _)) => - // If a CTE relation ref count is 0, the other CTE relations that reference it should also be - // collected. This code will also guarantee the leaf relations that do not reference - // any others are collected first. - if (refCount == 0) { -checkUnreferencedCTERelations(cteMap, visited, danglingCTERelations, cteId) - } -} -// Inline all CTEs in the plan to help check query plan structures in subqueries. -var inlinedPlan: LogicalPlan = plan -try { - inlinedPlan = inlineCTE(plan) +// We should inline all CTE relations to restore the original plan shape, as the analysis check +// may need to match certain plan shapes. For dangling CTE relations, they will still be kept +// in the original `WithCTE` node, as we need to perform analysis check for them as well. +val inlineCTE = InlineCTE(alwaysInline = true, keepDanglingRelations = true) +val inlinedPlan: LogicalPlan = try { + inlineCTE(plan) } catch { case e: AnalysisException => throw new ExtendedAnalysisEx
(spark) branch master updated (651f68782ab7 -> c7caac9b10ca)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 651f68782ab7 [SPARK-48531][INFRA] Fix `Black` target version to Python 3.9 add c7caac9b10ca [SPARK-47972][SQL][FOLLOWUP] Restrict CAST expression for collations No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 1 - 1 file changed, 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48318][SQL] Enable hash join support for all collations (complex types)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c852c4f72acb [SPARK-48318][SQL] Enable hash join support for all collations (complex types) c852c4f72acb is described below commit c852c4f72acb658ff0193f16b526c8f653188a4e Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Tue Jun 4 00:10:50 2024 -0700 [SPARK-48318][SQL] Enable hash join support for all collations (complex types) ### What changes were proposed in this pull request? Enable collation support for hash join on complex types. - Logical plan is rewritten in analysis to (recursively) replace all non-binary strings with CollationKey - CollationKey is a unary expression that transforms StringType to BinaryType - Collation keys allow correct & efficient string comparison under specific collation rules ### Why are the changes needed? Improve JOIN performance for complex types containing collated strings. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Unit tests for `CollationKey` in `CollationExpressionSuite` - E2e SQL tests for `RewriteCollationJoin` in `CollationSuite` - Various queries with JOIN in existing TPCDS collation test suite ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46722 from uros-db/hash-join-cmx. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../catalyst/analysis/RewriteCollationJoin.scala | 72 ++- .../org/apache/spark/sql/CollationSuite.scala | 228 - 2 files changed, 289 insertions(+), 11 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala index fd443fd19a1f..ae29d21c7a71 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala @@ -17,24 +17,27 @@ package org.apache.spark.sql.catalyst.analysis -import org.apache.spark.sql.catalyst.expressions.{AttributeReference, CollationKey, Equality} +import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.logical.{Join, LogicalPlan} import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.catalyst.util.CollationFactory +import org.apache.spark.sql.catalyst.util.UnsafeRowUtils +import org.apache.spark.sql.types._ import org.apache.spark.sql.types.StringType +import org.apache.spark.util.ArrayImplicits.SparkArrayOps +/** + * This rule rewrites Join conditions to ensure that all types containing non-binary collated + * strings are compared correctly. This is necessary because join conditions are evaluated using + * binary equality, which does not work correctly for non-binary collated strings. However, by + * injecting CollationKey expressions into the join condition, we can ensure that the comparison + * is done correctly, which then allows HashJoin to work properly on this type of data. + */ object RewriteCollationJoin extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case j @ Join(_, _, _, Some(condition), _) => val newCondition = condition transform { case e @ Equality(l: AttributeReference, r: AttributeReference) => - (l.dataType, r.dataType) match { -case (st: StringType, _: StringType) - if !CollationFactory.fetchCollation(st.collationId).supportsBinaryEquality => -e.withNewChildren(Seq(CollationKey(l), CollationKey(r))) -case _ => - e - } + e.withNewChildren(Seq(processExpression(l, l.dataType), processExpression(r, r.dataType))) } if (!newCondition.fastEquals(condition)) { j.copy(condition = Some(newCondition)) @@ -42,4 +45,55 @@ object RewriteCollationJoin extends Rule[LogicalPlan] { j } } + + /** + * Recursively process the expression in order to replace non-binary collated strings with their + * associated collation keys. This is necessary to ensure that the join condition is evaluated + * correctly for all types containing non-binary collated strings, including structs and arrays. + */ + private def processExpression(expr: Expression, dt: DataType): Expression = { +dt match { + // For binary stable expressions, no special handling is needed. + case _ if UnsafeRowUtils.isBinaryStable(dt) => +expr + + // Inj
(spark) branch master updated: [SPARK-47972][SQL] Restrict CAST expression for collations
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e4e8bb5936d3 [SPARK-47972][SQL] Restrict CAST expression for collations e4e8bb5936d3 is described below commit e4e8bb5936d305d27961c3a9c04d06ee1901977f Author: Mihailo Milosevic AuthorDate: Mon Jun 3 16:16:48 2024 -0700 [SPARK-47972][SQL] Restrict CAST expression for collations ### What changes were proposed in this pull request? Block of syntax CAST(value AS STRING COLLATE collation_name). ### Why are the changes needed? Current state of code allows for calls like CAST(1 AS STRING COLLATE UNICODE). We want to restrict CAST expression to only be able to cast to default collation string, and to only allow COLLATE expression to produce explicitly collated strings. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Test in CollationSuite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46474 from mihailom-db/SPARK-47972. Authored-by: Mihailo Milosevic Signed-off-by: Wenchen Fan --- .../sql/catalyst/analysis/CollationTypeCasts.scala | 2 -- .../spark/sql/catalyst/parser/AstBuilder.scala | 29 .../org/apache/spark/sql/CollationSuite.scala | 40 ++ 3 files changed, 69 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala index 16f8ec78e03e..b832cd4416a9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala @@ -132,8 +132,6 @@ object CollationTypeCasts extends TypeCoercionRule { def getOutputCollation(expr: Seq[Expression]): StringType = { val explicitTypes = expr.filter { case _: Collate => true -case cast: Cast if cast.getTagValue(Cast.USER_SPECIFIED_CAST).isDefined => - cast.dataType.isInstanceOf[StringType] case _ => false } .map(_.dataType.asInstanceOf[StringType].collationId) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index e2c975433ebd..86490a2eea97 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.parser import java.util.Locale import java.util.concurrent.TimeUnit +import scala.collection.immutable.Seq import scala.collection.mutable.{ArrayBuffer, Set} import scala.jdk.CollectionConverters._ import scala.util.{Left, Right} @@ -2265,6 +2266,20 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { */ override def visitCast(ctx: CastContext): Expression = withOrigin(ctx) { val rawDataType = typedVisit[DataType](ctx.dataType()) +ctx.dataType() match { + case context: PrimitiveDataTypeContext => +val typeCtx = context.`type`() +if (typeCtx.start.getType == STRING) { + typeCtx.children.asScala.toSeq match { +case Seq(_, cctx: CollateClauseContext) => + throw QueryParsingErrors.dataTypeUnsupportedError( +rawDataType.typeName, +ctx.dataType().asInstanceOf[PrimitiveDataTypeContext]) +case _ => + } +} + case _ => +} val dataType = CharVarcharUtils.replaceCharVarcharWithStringForCast(rawDataType) ctx.name.getType match { case SqlBaseParser.CAST => @@ -2284,6 +2299,20 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { */ override def visitCastByColon(ctx: CastByColonContext): Expression = withOrigin(ctx) { val rawDataType = typedVisit[DataType](ctx.dataType()) +ctx.dataType() match { + case context: PrimitiveDataTypeContext => +val typeCtx = context.`type`() +if (typeCtx.start.getType == STRING) { + typeCtx.children.asScala.toSeq match { +case Seq(_, cctx: CollateClauseContext) => + throw QueryParsingErrors.dataTypeUnsupportedError( +rawDataType.typeName, +ctx.dataType().asInstanceOf[PrimitiveDataTypeContext]) +case _ => + } +} + case _ => +} val dataType = CharVarcharUtils.replaceCharVarcharWithStringForCast(rawDataType) val cast = C
(spark) branch master updated: [SPARK-48413][SQL] ALTER COLUMN with collation
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f9542d008402 [SPARK-48413][SQL] ALTER COLUMN with collation f9542d008402 is described below commit f9542d008402f8cef96d5ec347583c7c1d30d840 Author: Nikola Mandic AuthorDate: Mon Jun 3 13:00:34 2024 -0700 [SPARK-48413][SQL] ALTER COLUMN with collation ### What changes were proposed in this pull request? Add support for changing collation of a column with `ALTER COLUMN` command. Use existing support for `ALTER COLUMN` with type to enable changing collations of column. Syntax example: ``` ALTER TABLE t1 ALTER COLUMN col TYPE STRING COLLATE UTF8_BINARY_LCASE ``` ### Why are the changes needed? Enable changing collation on column. ### Does this PR introduce _any_ user-facing change? Yes, it adds support for changing collation of column. ### How was this patch tested? Added tests to `DDLSuite` and `DataTypeSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46734 from nikolamand-db/SPARK-48413. Authored-by: Nikola Mandic Signed-off-by: Wenchen Fan --- .../src/main/resources/error/error-conditions.json | 6 ++ .../org/apache/spark/sql/types/DataType.scala | 35 +++ .../spark/sql/errors/QueryCompilationErrors.scala | 9 ++ .../org/apache/spark/sql/types/DataTypeSuite.scala | 109 + .../apache/spark/sql/execution/command/ddl.scala | 50 +++--- .../spark/sql/execution/command/DDLSuite.scala | 94 ++ 6 files changed, 290 insertions(+), 13 deletions(-) diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json index 69965e58fb79..5bab14e3eebf 100644 --- a/common/utils/src/main/resources/error/error-conditions.json +++ b/common/utils/src/main/resources/error/error-conditions.json @@ -119,6 +119,12 @@ ], "sqlState" : "42KDE" }, + "CANNOT_ALTER_COLLATION_BUCKET_COLUMN" : { +"message" : [ + "ALTER TABLE (ALTER|CHANGE) COLUMN cannot change collation of type/subtypes of bucket columns, but found the bucket column in the table ." +], +"sqlState" : "428FR" + }, "CANNOT_ALTER_PARTITION_COLUMN" : { "message" : [ "ALTER TABLE (ALTER|CHANGE) COLUMN is not supported for partition columns, but found the partition column in the table ." diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala b/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala index ea90aa2ca397..12c7905f62d1 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala @@ -408,6 +408,41 @@ object DataType { } } + /** + * Check if `from` is equal to `to` type except for collations, which are checked to be + * compatible so that data of type `from` can be interpreted as of type `to`. + */ + private[sql] def equalsIgnoreCompatibleCollation( + from: DataType, + to: DataType): Boolean = { +(from, to) match { + // String types with possibly different collations are compatible. + case (_: StringType, _: StringType) => true + + case (ArrayType(fromElement, fromContainsNull), ArrayType(toElement, toContainsNull)) => +(fromContainsNull == toContainsNull) && + equalsIgnoreCompatibleCollation(fromElement, toElement) + + case (MapType(fromKey, fromValue, fromContainsNull), + MapType(toKey, toValue, toContainsNull)) => +fromContainsNull == toContainsNull && + // Map keys cannot change collation. + fromKey == toKey && + equalsIgnoreCompatibleCollation(fromValue, toValue) + + case (StructType(fromFields), StructType(toFields)) => +fromFields.length == toFields.length && + fromFields.zip(toFields).forall { case (fromField, toField) => +fromField.name == toField.name && + fromField.nullable == toField.nullable && + fromField.metadata == toField.metadata && + equalsIgnoreCompatibleCollation(fromField.dataType, toField.dataType) + } + + case (fromDataType, toDataType) => fromDataType == toDataType +} + } + /** * Returns true if the two data types share the same "shape", i.e. the types * are the same, but the field names don't need to be the same. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql
(spark) branch master updated: [SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5d71ef0716f7 [SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed 5d71ef0716f7 is described below commit 5d71ef0716f7a2d470d05bf3c04441382cd80138 Author: Jack Chen AuthorDate: Mon Jun 3 10:51:11 2024 -0700 [SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed ### What changes were proposed in this pull request? Fixes CheckAnalysis to reject invalid scalar subquery group-bys that were previously allowed and returned wrong results. For example, this query is not legal and should give an error, but instead we incorrectly allowed it and it returns wrong results prior to this PR (full repro with table data in the jira): ``` select *, (select count(*) from y where y1 > x1 group by y1) from x; ``` It returns two rows, even though there's only one row of x. The correct result is an error, because there is more than one row returned by the scalar subquery. Another problem case is if the correlation condition is an equality but it's under another operator like an OUTER JOIN or UNION. Various other expressions that are not equi-joins between the inner and outer fields hit this too, e.g. `where y1 + y2 = x1 group by y1`. See the comments in the code and the tests for more examples. This PR fixes the logic which checks for valid vs invalid group-bys. However, note that this new logic may block some queries that are actually valid, for example `a + 1 = outer(b)` is valid but would be rejected. Therefore, we add a conf flag that can be used to restore the legacy behavior, as well as logging for when the legacy behavior is used and differs from the new behavior. (In general, to accurately run valid queries and reject invalid queries, the check must be moved from com [...] This is a longstanding bug. The bug is in CheckAnalysis in checkAggregateInScalarSubquery. It allows grouping columns that are present in correlation predicates, but doesn’t check whether those predicates are equalities - because when that code was written, non-equality correlation wasn’t allowed. Therefore, this bug has existed since non-equality correlation was added (~2 years ago). ### Why are the changes needed? Fix invalid queries returning wrong results ### Does this PR introduce _any_ user-facing change? Yes, block subqueries with invalid group-bys. ### How was this patch tested? Add tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #46839 from jchen5/scalar-subq-gby. Authored-by: Jack Chen Signed-off-by: Wenchen Fan --- .../sql/catalyst/analysis/CheckAnalysis.scala | 38 +++- .../spark/sql/catalyst/expressions/subquery.scala | 72 ++- .../org/apache/spark/sql/internal/SQLConf.scala| 9 + .../scalar-subquery-group-by.sql.out | 206 .../scalar-subquery/scalar-subquery-group-by.sql | 28 +++ .../scalar-subquery-group-by.sql.out | 211 + 6 files changed, 555 insertions(+), 9 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index e18f4d1b36e1..1c2baa78be1b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.analysis import scala.collection.mutable import org.apache.spark.SparkException +import org.apache.spark.internal.Logging import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.catalyst.ExtendedAnalysisException import org.apache.spark.sql.catalyst.expressions._ @@ -41,7 +42,7 @@ import org.apache.spark.util.Utils /** * Throws user facing errors when passed invalid queries that fail to analyze. */ -trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsBase { +trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsBase with Logging { protected def isView(nameParts: Seq[String]): Boolean @@ -912,13 +913,36 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB // SPARK-18504/SPARK-18814: Block cases where GROUP BY columns // are not part of the correlated columns. + + // Note: groupByCols does not contain outer refs - grouping by an outer ref is always ok val groupByC
svn commit: r69509 - in /dev/spark: v4.0.0-preview1-rc1-bin/ v4.0.0-preview1-rc1-docs/ v4.0.0-preview1-rc2-bin/ v4.0.0-preview1-rc2-docs/
Author: wenchen Date: Mon Jun 3 01:12:24 2024 New Revision: 69509 Log: Removing RC artifacts. Removed: dev/spark/v4.0.0-preview1-rc1-bin/ dev/spark/v4.0.0-preview1-rc1-docs/ dev/spark/v4.0.0-preview1-rc2-bin/ dev/spark/v4.0.0-preview1-rc2-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r69508 - /release/spark/KEYS
Author: wenchen Date: Mon Jun 3 00:59:52 2024 New Revision: 69508 Log: Update KEYS Modified: release/spark/KEYS Modified: release/spark/KEYS == --- release/spark/KEYS (original) +++ release/spark/KEYS Mon Jun 3 00:59:52 2024 @@ -2079,3 +2079,61 @@ ThVo7dEVoknhannfoULNv5ekjZ/LsFNGHRUZ =9cvL -END PGP PUBLIC KEY BLOCK- +pub rsa4096 2024-05-07 [SC] + 4DC9676CEF9A83E98FCA02784D6620843CD87F5A +uid Wenchen Fan (CODE SIGNING KEY) +sub rsa4096 2024-05-07 [E] + +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBGY6XpcBEADBeNz3IBYriwrPzMYJJO5u1DaWAJ4Sryx6PUZgvssrcqojYVTh +MjtlBkWRcNquAyDrVlU1vtq1yMq5KopQoAEi/l3xaEDZZ0IFAob6+GlGXEon2Jvf +0FXQsx+Df4nMVl7KPqh68T++Z4GkvK5wyyN9uaUTWL2deGeinVxTh6qWQT8YiCd5 +wof+Dk5IIzKQ5VIBhU/U9S0jo/pqhH4okcZGTyT2Q7sfg4eXl5+Y2OR334RkvTcX +uJjcnJ8BUbBSm1UhNg4OGBEJgi+lE1GEgw4juOfTAPh9fx8SCLhuX0m6Qc/y9bAK +Q4zejbF5F2Um9dqrZqg6Egp+nlzydn59hq9owSnQ6JdoA/PLcgoign0sghu9xGCR +GpgI2kS7Q8bu6dy7T0BfUerLZ1FHu7nCT2ZNSIh/Y2eOhuBhUr3llg8xa3PZZob/ +2sZE2dJ3g/qp2Nbo+s5Q5kELtuo6cZD0EISQwt68hGWIgxs0vtci2c2kQYFS0oqw +fGynEeDFZRHV3ET5rioYaoPi70Cnibght5ocL0t6sl0RQQVp6k2i1aofJbZA480N +ivuJ5agGaSRxmIDk6JlDsHJGxO9oC066ZLJiR6i0JUinGP7sw/nNmgup/AB+y4hW +9WdeAFyYmuYysDRRyE6z1MPDp1R00MyGxHNFDF64/JPY/nKKFdXp+aCazwARAQAB +tDNXZW5jaGVuIEZhbiAoQ09ERSBTSUdOSU5HIEtFWSkgPHdlbmNoZW5AYXBhY2hl +Lm9yZz6JAlEEEwEIADsWIQRNyWds75qD6Y/KAnhNZiCEPNh/WgUCZjpelwIbAwUL +CQgHAgIiAgYVCgkICwIEFgIDAQIeBwIXgAAKCRBNZiCEPNh/WkofD/9sI7J3i9Ck +NOlHpVnjAaHjyGX5cVA2dZGniJdLf5yOKOI6pu7dMW+NThsXO1Iv+BRYo7una6/Q +vUquKKxCXIN3vNmKIB1e9lj4MaIhCRmXUSQxjkVa9JW3P/F520Ct3VjiCZ5IjPv+ +g1hF/wrkuuoAFlcC/bfGWafkaZgszavSpCdp27mUXUNbvLW0dPJ3+ay4cDPuT1DI +6DhB8qpqN7gInDFACW2qtQ2KZh1JFGy5ZccQ9dB3t/B4BYiUie6a3eQWgKqLF1hw +8yHY3DkCVGfnXJk4+LMWqgazQxoB6oZjBvoQYtGOPXr1ZbmtiRHCDM5KmZ+QmIXB +ZGBXkLaqt2QGxlwUGlvn+nKuTsp8VL1APIlKdMpvMW59uz1ycZHMeTJGAMtZw8Qm +kxG62kqnDYeZ6oWwinY3wYP4UmqFSWIfcHMfBwED4uOC//r9H1bO+JRFMwOxqSN7 +kGfFJoV5eOvMOwRnXPJiPpnQEHPEkp/TAl2ANHWzdXy9TifiHOvTln3NXQVpznnW +H6f9+W36J1IE9EWktciptKUtvwY1np+G71Swa0Q4mNgb8OGf6UNJGv4vPbSlhzlr +1a5oYP59eHO3XqANcuKyTFxfja+rgrMldufZFCk1hSnBdAic/jaHrhIQSLcTGFiJ +QVyiC2VlO2eZCkCTfoSlolwgzzoY4wNumLkCDQRmOl6XARAAt+N+djFZOuJdLcSz +pz6nG88gxLmPwf+Xlhv2+xDS3wyM1OWmDAkeMDNq8OuZMes6ZXwRxDvSj7w7dlE6 +dQ1BlDz4RP4GoYG++dnPlHp/NWQ8I/eW8XC5uxkvl56YG/0DudoTLb5nxHtv+kpm +p+eVCqWRYI5RQPdcxEZzXEije+aEj2aMRQ8cO7RAgTamRWXr+XsRkSypZ8ttTISr +u+UuQPKT6XRMtkB2i8ekwO+jIK/mMrAteIF/cK0jv2JTlYmWrBtmGgYjHZHlzZak +/MzWN4tU5VbJMMXa9wHicZS0/cPV9Fz3dnR0sBVgaIDsK+/vRGxHd/LGFtXH+Wrp +pPMaR4FHCx3r44aL17B5lJocwf7Xma2gavOl80NR+a8iOW6biKdlALRZKX4G4cJj +1vnWHDJceZOuFWMVIs7zfJymvQpROCRED3q1el+zCICnLtBue6ikqv7nfyBNCaR2 +qZhw4TPMzzGTRIdKIalcSTi+bGfSYTsU2kVDBbH+0nD5I7Tx62H4shsJtgmwyP4R +q2dxJPpC4i+L09crjyl7rYvwHu4QU8vxcQXN4cH4O5pKOr2GoGnV8Y7kpZaRUo6w +/Q/Rx3I3UKAyYJv0R1mK4AifM0JzMkqxAUvUdUbs2obRT04sxtr1bA+9dLEv4b8c +YGKmRgt96GCNx1XZ8Q+FPdmsaO0AEQEAAYkCNgQYAQgAIBYhBE3JZ2zvmoPpj8oC +eE1mIIQ82H9aBQJmOl6XAhsMAAoJEE1mIIQ82H9aBfAQAKf6xHNuKibXcRMwqmcx +rx18d0dbeMEjrPqSe5vGOylLQZRpwZmKwflU9kZgOU2WRuqZsaPE0w5wxhsNDe8s +UqxW08xB6v8BVj6BT9umJQNyQF5CrsjkZe2EtmYlbdNmt4t8DMNEmhhasEglWUui +0se3I0wIwDaYAW+KppwzweO8SrUZVaB6QhOckRFhz/1wCNyc2Yp90OjWjuATffOE +ZWSeGPn9GCbtJ+SPtLtMUlxy/BoRA6OWv6H5VAt6pJVw3XPP/o450i7lYxbmbv8W +qm5/8nWx1XBvTvOxGoT9h+45bWjLTXtJJ2RhEftGHZ9439VSgssXBl+S/yjpnHOa +14tRCVABP8bgAQ7HEKZ9YyII6MOAEzNa2gNVKr7+gwB1ddrGdzx6TrIUwRlgilDJ +XORdEON4Ssx31Y1+Dt+d4lkkGu5Ymkj8iFIeH6FNOnFWM/stTmL0fE4IGpWbUHc+ +nqz7zEgili8TanLQRUmz9ClVJTG4G9t31FYF8nNzDPxug9oSMJXBfVlzhRMRZH3z +t/XdxNFHyu7rzXidiXTJSmujeqS++mKcXxx02m+V2qfwkAwnt6OS9NDLPVrzuuMN +NDfY3Gr4dTCbd+JQxtC0w4GuUV1V3lcOwyEjPKJVYuZwUl0UspRbNmtsaybRbzVs ++q68az33WU5++zSuqrU3fIRp +=1zLb +-END PGP PUBLIC KEY BLOCK- + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r69507 - /dev/spark/v4.0.0-preview1-rc3-bin/ /release/spark/spark-4.0.0-preview1/
Author: wenchen Date: Mon Jun 3 00:59:50 2024 New Revision: 69507 Log: Apache Spark 4.0.0-preview1 Added: release/spark/spark-4.0.0-preview1/ - copied from r69506, dev/spark/v4.0.0-preview1-rc3-bin/ Removed: dev/spark/v4.0.0-preview1-rc3-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r69506 - /dev/spark/v4.0.0-preview1-rc3-docs/
Author: wenchen Date: Mon Jun 3 00:59:48 2024 New Revision: 69506 Log: Remove RC artifacts Removed: dev/spark/v4.0.0-preview1-rc3-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) tag v4.0.0-preview1 created (now 7a7a8bc4bab5)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to tag v4.0.0-preview1 in repository https://gitbox.apache.org/repos/asf/spark.git at 7a7a8bc4bab5 (commit) No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (96365c86962b -> 3cd35f8cb646)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 96365c86962b [SPARK-48465][SQL] Avoid no-op empty relation propagation add 3cd35f8cb646 [SPARK-48391][CORE] Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48391][CORE] Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 744b070fa964 [SPARK-48391][CORE] Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class 744b070fa964 is described below commit 744b070fa964dee9e5460a24f88f22c3af8170dc Author: Dereck Li AuthorDate: Fri May 31 15:56:05 2024 -0700 [SPARK-48391][CORE] Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class ### What changes were proposed in this pull request? Using addAll instead of add function in fromAccumulators method of TaskMetrics. ### Why are the changes needed? To Improve performance. In the fromAccumulators method of TaskMetrics,we should use ` tm._externalAccums.addAll` instead of `tm._externalAccums.add`, as _externalAccums is a instance of CopyOnWriteArrayList ### Does this PR introduce _any_ user-facing change? yes. ### How was this patch tested? No Tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46705 from monkeyboy123/fromAccumulators-accelerate. Authored-by: Dereck Li Signed-off-by: Wenchen Fan (cherry picked from commit 3cd35f8cb6462051c621cf49de54b9c5692aae1d) Signed-off-by: Wenchen Fan --- core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala b/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala index 78b39b0cbda6..d446104cb642 100644 --- a/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala +++ b/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala @@ -328,16 +328,19 @@ private[spark] object TaskMetrics extends Logging { */ def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = { val tm = new TaskMetrics +val externalAccums = new java.util.ArrayList[AccumulatorV2[Any, Any]]() for (acc <- accums) { val name = acc.name + val tmpAcc = acc.asInstanceOf[AccumulatorV2[Any, Any]] if (name.isDefined && tm.nameToAccums.contains(name.get)) { val tmAcc = tm.nameToAccums(name.get).asInstanceOf[AccumulatorV2[Any, Any]] tmAcc.metadata = acc.metadata -tmAcc.merge(acc.asInstanceOf[AccumulatorV2[Any, Any]]) +tmAcc.merge(tmpAcc) } else { -tm._externalAccums.add(acc) +externalAccums.add(tmpAcc) } } +tm._externalAccums.addAll(externalAccums) tm } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (844821c82da5 -> 96365c86962b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 844821c82da5 [SPARK-47578][R] Migrate RPackageUtils with variables to structured logging framework add 96365c86962b [SPARK-48465][SQL] Avoid no-op empty relation propagation No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala | 6 -- .../spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (747437c80aa8 -> f083e61925e9)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 747437c80aa8 [SPARK-48476][SQL] fix NPE error message for null delmiter csv add f083e61925e9 [SPARK-48430][SQL] Fix map value extraction when map contains collated strings No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/CollationTypeCasts.scala | 20 ++--- .../spark/sql/catalyst/analysis/TypeCoercion.scala | 2 +- .../spark/sql/catalyst/expressions/misc.scala | 4 ++-- .../spark/sql/CollationSQLExpressionsSuite.scala | 23 .../org/apache/spark/sql/CollationSuite.scala | 25 -- 5 files changed, 56 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48476][SQL] fix NPE error message for null delmiter csv
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 747437c80aa8 [SPARK-48476][SQL] fix NPE error message for null delmiter csv 747437c80aa8 is described below commit 747437c80aa875844f41ac61a419443af9f3b4b2 Author: milastdbx AuthorDate: Fri May 31 09:10:38 2024 -0700 [SPARK-48476][SQL] fix NPE error message for null delmiter csv ### What changes were proposed in this pull request? In this pull request i propose we throw proper error code when customer specifies null as a delimiter for CSV. Currently we throw NPE. ### Why are the changes needed? To make spark more user friendly. ### Does this PR introduce _any_ user-facing change? Yes, customer will now get INVALID_DELIMITER_VALUE.NULL_VALUE error class when they specify null for delimiter of csv. ### How was this patch tested? unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #46810 from milastdbx/dev/milast/fixNPEForDelimiterCSV. Authored-by: milastdbx Signed-off-by: Wenchen Fan --- common/utils/src/main/resources/error/error-conditions.json | 5 + .../scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala | 5 + .../org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala| 9 + 3 files changed, 19 insertions(+) diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json index 3914c0f177dc..3dd7a6d65d7f 100644 --- a/common/utils/src/main/resources/error/error-conditions.json +++ b/common/utils/src/main/resources/error/error-conditions.json @@ -2021,6 +2021,11 @@ "Delimiter cannot be empty string." ] }, + "NULL_VALUE" : { +"message" : [ + "Delimiter cannot be null." +] + }, "SINGLE_BACKSLASH" : { "message" : [ "Single backslash is prohibited. It has special meaning as beginning of an escape sequence. To get the backslash character, pass a string with two backslashes as the delimiter." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala index 62638d70dd90..7b6664a4117a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala @@ -120,6 +120,11 @@ object CSVExprUtils { * @throws SparkIllegalArgumentException if any of the individual input chunks are illegal */ def toDelimiterStr(str: String): String = { +if (str == null) { + throw new SparkIllegalArgumentException( +errorClass = "INVALID_DELIMITER_VALUE.NULL_VALUE") +} + var idx = 0 var delimiter = "" diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala index 2e94c723a6f2..d4b68500e078 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala @@ -33,6 +33,15 @@ class CSVExprUtilsSuite extends SparkFunSuite { assert(CSVExprUtils.toChar("""\\""") === '\\') } + test("Does not accept null delimiter") { +checkError( + exception = intercept[SparkIllegalArgumentException]{ +CSVExprUtils.toDelimiterStr(null) + }, + errorClass = "INVALID_DELIMITER_VALUE.NULL_VALUE", + parameters = Map.empty) + } + test("Does not accept delimiter larger than one character") { checkError( exception = intercept[SparkIllegalArgumentException]{ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48419][SQL] Foldable propagation replace foldable column shoul…
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3e27543128c8 [SPARK-48419][SQL] Foldable propagation replace foldable column shoul… 3e27543128c8 is described below commit 3e27543128c84bb4b6642589bb1c6da21c38b957 Author: KnightChess <981159...@qq.com> AuthorDate: Thu May 30 17:53:38 2024 -0700 [SPARK-48419][SQL] Foldable propagation replace foldable column shoul… …d use origin column name ### What changes were proposed in this pull request? fix optimizer rule `FoldablePropagation` will change column name, use origin name. ### Why are the changes needed? fix bug ### Does this PR introduce _any_ user-facing change? `before fix` befor optimizer: ```shell 'Project ['x, 'y, 'z] +- 'Project ['a AS x, str AS Y, 'b AS z] +- LocalRelation , [a, b] ``` after optimizer: ```shell Project [x, str AS Y, z] +- Project [a#0 AS x#112, str AS Y#113, b#1 AS z#114] +- LocalRelation , [a, b] ``` column name `y` will be replace to 'Y', it change plan schame `after fix` the query plan schema is still y ```shell Project [x, str AS y, z] +- Project [a#0 AS x#112, str AS Y#113, b#1 AS z#114] +- LocalRelation , [a, b] ``` ### How was this patch tested? Added UT ### Was this patch authored or co-authored using generative AI tooling? No Closes #46742 from KnightChess/fix-foldable-propagation. Authored-by: KnightChess <981159...@qq.com> Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/catalyst/optimizer/expressions.scala | 2 +- .../sql/catalyst/optimizer/FoldablePropagationSuite.scala | 11 +++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala index 32700f176f25..2c55e4c8fd37 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala @@ -1023,7 +1023,7 @@ object FoldablePropagation extends Rule[LogicalPlan] { plan } else { plan transformExpressions { -case a: AttributeReference if foldableMap.contains(a) => foldableMap(a) +case a: AttributeReference if foldableMap.contains(a) => foldableMap(a).withName(a.name) } } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala index 767ef38ea7f7..5866f29e4e86 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala @@ -214,4 +214,15 @@ class FoldablePropagationSuite extends PlanTest { val expected = testRelation.select(foldableAttr, $"a").rebalance(foldableAttr, $"a").analyze comparePlans(optimized, expected) } + + test("SPARK-48419: Foldable propagation replace foldable column should use origin column name") { +val query = testRelation + .select($"a".as("x"), "str".as("Y"), $"b".as("z")) + .select($"x", $"y", $"z") +val optimized = Optimize.execute(query.analyze) +val correctAnswer = testRelation + .select($"a".as("x"), "str".as("Y"), $"b".as("z")) + .select($"x", "str".as("y"), $"z").analyze +comparePlans(optimized, correctAnswer) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48468] Add LogicalQueryStage interface in catalyst
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6b4f97e1411c [SPARK-48468] Add LogicalQueryStage interface in catalyst 6b4f97e1411c is described below commit 6b4f97e1411c223b77e7bbc4b46a5f399c39823e Author: Ziqi Liu AuthorDate: Thu May 30 14:10:18 2024 -0700 [SPARK-48468] Add LogicalQueryStage interface in catalyst ### What changes were proposed in this pull request? Adding `LogicalQueryStage` interface in catalyst, and `org.apache.spark.sql.execution.adaptive.LogicalQueryStage` inherits from `logical.LogicalQueryStage` ### Why are the changes needed? Make LogicalQueryStage visible in logical rewrites. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? no Closes #46799 from liuzqt/SPARK-48468. Authored-by: Ziqi Liu Signed-off-by: Wenchen Fan --- .../sql/catalyst/plans/logical/LogicalPlan.scala | 28 ++ .../sql/execution/adaptive/LogicalQueryStage.scala | 17 ++--- 2 files changed, 42 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala index 98e91585c2a0..a2ede8ac735c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala @@ -25,6 +25,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.{AliasAwareQueryOutputOrdering, QueryPlan} import org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats import org.apache.spark.sql.catalyst.trees.{BinaryLike, LeafLike, TreeNodeTag, UnaryLike} +import org.apache.spark.sql.catalyst.trees.TreePattern.{LOGICAL_QUERY_STAGE, TreePattern} import org.apache.spark.sql.catalyst.types.DataTypeUtils import org.apache.spark.sql.catalyst.util.MetadataColumnHelper import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryExecutionErrors} @@ -214,6 +215,33 @@ trait LeafNode extends LogicalPlan with LeafLike[LogicalPlan] { throw new SparkUnsupportedOperationException("_LEGACY_ERROR_TEMP_3114") } +/** + * A abstract class for LogicalQueryStage that is visible in logical rewrites. + */ +abstract class LogicalQueryStage extends LeafNode { + override protected val nodePatterns: Seq[TreePattern] = Seq(LOGICAL_QUERY_STAGE) + + /** + * Returns the logical plan that is included in this query stage + */ + def logicalPlan: LogicalPlan + + /** + * Returns the physical plan. + */ + def physicalPlan: QueryPlan[_] + + /** + * Return true if the physical stage is materialized + */ + def isMaterialized: Boolean + + /** + * Return true if the physical plan corresponds directly to a stage + */ + def isDirectStage: Boolean +} + /** * A logical plan node with single child. */ diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LogicalQueryStage.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LogicalQueryStage.scala index 8ce2452cc141..506f52fd9072 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LogicalQueryStage.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LogicalQueryStage.scala @@ -18,7 +18,8 @@ package org.apache.spark.sql.execution.adaptive import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder} -import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, RepartitionOperation, Statistics} +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, RepartitionOperation, Statistics} +import org.apache.spark.sql.catalyst.plans.logical import org.apache.spark.sql.catalyst.trees.TreePattern.{LOGICAL_QUERY_STAGE, REPARTITION_OPERATION, TreePattern} import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.aggregate.BaseAggregateExec @@ -35,8 +36,8 @@ import org.apache.spark.sql.execution.aggregate.BaseAggregateExec // TODO we can potentially include only [[QueryStageExec]] in this class if we make the aggregation // planning aware of partitioning. case class LogicalQueryStage( -logicalPlan: LogicalPlan, -physicalPlan: SparkPlan) extends LeafNode { +override val logicalPlan: LogicalPlan, +override val physicalPlan: SparkPlan) extends logical.LogicalQueryStage { override def output: Seq[Attribute] = logicalPlan.output override val isStreaming: Boolean = logicalPlan.isStreamin
(spark) branch master updated: [SPARK-48477][SQL][TESTS] Use withSQLConf in tests: Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e1f5a7c856ab [SPARK-48477][SQL][TESTS] Use withSQLConf in tests: Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite e1f5a7c856ab is described below commit e1f5a7c856ab7ed4bf03e490ee7c1307775a Author: Rui Wang AuthorDate: Thu May 30 14:07:10 2024 -0700 [SPARK-48477][SQL][TESTS] Use withSQLConf in tests: Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite ### What changes were proposed in this pull request? Use withSQLConf in tests when it is appropriate. ### Why are the changes needed? Enforce good practice for setting config in test cases. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Existing UT ### Was this patch authored or co-authored using generative AI tooling? NO Closes #46812 from amaliujia/sql_config_4. Authored-by: Rui Wang Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/CollationSuite.scala | 16 +-- .../execution/CoalesceShufflePartitionsSuite.scala | 128 +++-- .../spark/sql/execution/SQLExecutionSuite.scala| 9 +- 3 files changed, 78 insertions(+), 75 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala index 9b3bfe1c77b3..42da779b84ad 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala @@ -677,14 +677,14 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanHelper { sql(s"INSERT INTO $tableName VALUES ('bbb', 'bbb')") sql(s"INSERT INTO $tableName VALUES ('BBB', 'BBB')") - sql(s"SET spark.sql.legacy.createHiveTableByDefault=false") - - withTable(newTableName) { -checkError( - exception = intercept[AnalysisException] { -sql(s"CREATE TABLE $newTableName AS SELECT c1 || c2 FROM $tableName") - }, - errorClass = "COLLATION_MISMATCH.IMPLICIT") + withSQLConf("spark.sql.legacy.createHiveTableByDefault" -> "false") { +withTable(newTableName) { + checkError( +exception = intercept[AnalysisException] { + sql(s"CREATE TABLE $newTableName AS SELECT c1 || c2 FROM $tableName") +}, +errorClass = "COLLATION_MISMATCH.IMPLICIT") +} } } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala index e87b90dfdd84..dc72b4a092ae 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala @@ -21,6 +21,7 @@ import org.apache.spark.{SparkConf, SparkFunSuite} import org.apache.spark.internal.config.IO_ENCRYPTION_ENABLED import org.apache.spark.internal.config.UI.UI_ENABLED import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.SQLConfHelper import org.apache.spark.sql.execution.adaptive._ import org.apache.spark.sql.execution.adaptive.AQEShuffleReadExec import org.apache.spark.sql.execution.exchange.ReusedExchangeExec @@ -28,7 +29,7 @@ import org.apache.spark.sql.functions._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.util.ArrayImplicits._ -class CoalesceShufflePartitionsSuite extends SparkFunSuite { +class CoalesceShufflePartitionsSuite extends SparkFunSuite with SQLConfHelper { private var originalActiveSparkSession: Option[SparkSession] = _ private var originalInstantiatedSparkSession: Option[SparkSession] = _ @@ -374,72 +375,73 @@ class CoalesceShufflePartitionsSuite extends SparkFunSuite { test("SPARK-24705 adaptive query execution works correctly when exchange reuse enabled") { val test: SparkSession => Unit = { spark: SparkSession => - spark.sql("SET spark.sql.exchange.reuse=true") - val df = spark.range(0, 6, 1).selectExpr("id AS key", "id AS value") - - // test case 1: a query stage has 3 child stages but they are the same stage. - // Final Stage 1 - // ShuffleQueryStage 0 - // ReusedQueryStage 0 - // ReusedQueryStage 0 - val resultDf = df.join(df, "key").join(df, "key") - QueryTest.checkAnswer(resultDf, (0 to 5).map(i => Row(i, i, i, i))) -
(spark) branch master updated (69afd4be9c93 -> f68d761c9b21)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 69afd4be9c93 [SPARK-47361][SQL] Derby: Calculate suitable precision and scale for DECIMAL type add f68d761c9b21 [SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/SparkContext.scala | 7 +- .../src/main/scala/org/apache/spark/SparkEnv.scala | 12 +-- .../spark/scheduler/OutputCommitCoordinator.scala | 12 +-- .../OutputCommitCoordinatorIntegrationSuite.scala | 11 ++- .../scheduler/OutputCommitCoordinatorSuite.scala | 19 +++-- .../datasources/parquet/ParquetIOSuite.scala | 85 ++ 6 files changed, 58 insertions(+), 88 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-41049][SQL][FOLLOW-UP][3.5] stateful expressions test uses different pretty name
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new c87b6483a3e0 [SPARK-41049][SQL][FOLLOW-UP][3.5] stateful expressions test uses different pretty name c87b6483a3e0 is described below commit c87b6483a3e0690be2b267e6dcf93a3edd63b030 Author: Rui Wang AuthorDate: Wed May 29 17:15:17 2024 -0700 [SPARK-41049][SQL][FOLLOW-UP][3.5] stateful expressions test uses different pretty name ### What changes were proposed in this pull request? We need use a different pretty string for the stateful expression test case in branch-3.5. ### Why are the changes needed? Fix the failing test case. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Existing UT ### Was this patch authored or co-authored using generative AI tooling? NO Closes #46795 from amaliujia/branch-3.5. Authored-by: Rui Wang Signed-off-by: Wenchen Fan --- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala index 260ecaa5ece1..7ee18df37561 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala @@ -3641,8 +3641,9 @@ class DataFrameSuite extends QueryTest val v4 = to_csv(struct(v3.as("a"))) // to_csv is CodegenFallback df.select(v3, v3, v4, v4).collect().foreach { row => assert(row.getMap(0).toString() == row.getMap(1).toString()) - assert(row.getString(2) == s"{key -> ${row.getMap(0).get("key").get}}") - assert(row.getString(3) == s"{key -> ${row.getMap(0).get("key").get}}") + val expectedString = s"keys: [key], values: [${row.getMap(0).get("key").get}]" + assert(row.getString(2) == s"""\"$expectedString\"""") + assert(row.getString(3) == s"""\"$expectedString\"""") } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48431][SQL] Do not forward predicates on collated columns to file readers
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a3b8420e5eec [SPARK-48431][SQL] Do not forward predicates on collated columns to file readers a3b8420e5eec is described below commit a3b8420e5eecc3ce33528bc7c73967a64b1f670e Author: Ole Sasse AuthorDate: Wed May 29 13:52:33 2024 -0700 [SPARK-48431][SQL] Do not forward predicates on collated columns to file readers ### What changes were proposed in this pull request? [SPARK-47657](https://issues.apache.org/jira/browse/SPARK-47657) allows to push filters on collated columns to file sources that support it. If such filters are pushed to file sources, those file sources must not push those filters to the actual file readers (i.e. parquet or csv readers), because there is no guarantee that those support collations. In this PR we are widening filters on collations to be AlwaysTrue when we translate filters for file sources. ### Why are the changes needed? Without this, no file source can implement filter pushdown ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit tests. No component tests are possible because there is no file source with filter pushdown yet. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46760 from olaky/filter-translation-for-collations. Authored-by: Ole Sasse Signed-off-by: Wenchen Fan --- .../execution/datasources/DataSourceStrategy.scala | 31 +--- .../datasources/DataSourceStrategySuite.scala | 55 +- 2 files changed, 78 insertions(+), 8 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index 22b60caf2669..7cda347ce581 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -54,7 +54,7 @@ import org.apache.spark.sql.execution.streaming.StreamingRelation import org.apache.spark.sql.sources._ import org.apache.spark.sql.types._ import org.apache.spark.sql.util.{PartitioningUtils => CatalystPartitioningUtils} -import org.apache.spark.sql.util.CaseInsensitiveStringMap +import org.apache.spark.sql.util.{CaseInsensitiveStringMap, SchemaUtils} import org.apache.spark.unsafe.types.UTF8String /** @@ -595,6 +595,16 @@ object DataSourceStrategy translatedFilterToExpr: Option[mutable.HashMap[sources.Filter, Expression]], nestedPredicatePushdownEnabled: Boolean) : Option[Filter] = { + +def translateAndRecordLeafNodeFilter(filter: Expression): Option[Filter] = { + val translatedFilter = +translateLeafNodeFilter(filter, PushableColumn(nestedPredicatePushdownEnabled)) + if (translatedFilter.isDefined && translatedFilterToExpr.isDefined) { +translatedFilterToExpr.get(translatedFilter.get) = predicate + } + translatedFilter +} + predicate match { case expressions.And(left, right) => // See SPARK-12218 for detailed discussion @@ -621,16 +631,25 @@ object DataSourceStrategy right, translatedFilterToExpr, nestedPredicatePushdownEnabled) } yield sources.Or(leftFilter, rightFilter) + case notNull @ expressions.IsNotNull(_: AttributeReference) => +// Not null filters on attribute references can always be pushed, also for collated columns. +translateAndRecordLeafNodeFilter(notNull) + + case isNull @ expressions.IsNull(_: AttributeReference) => +// Is null filters on attribute references can always be pushed, also for collated columns. +translateAndRecordLeafNodeFilter(isNull) + + case p if p.references.exists(ref => SchemaUtils.hasNonUTF8BinaryCollation(ref.dataType)) => +// The filter cannot be pushed and we widen it to be AlwaysTrue(). This is only valid if +// the result of the filter is not negated by a Not expression it is wrapped in. +translateAndRecordLeafNodeFilter(Literal.TrueLiteral) + case expressions.Not(child) => translateFilterWithMapping(child, translatedFilterToExpr, nestedPredicatePushdownEnabled) .map(sources.Not) case other => -val filter = translateLeafNodeFilter(other, PushableColumn(nestedPredicatePushdownEnabled)) -if (filter.isDefined && translatedFilterToExpr.isDefined) { - translatedFilterToExpr.get(filter.get) = predicate -} -filter +translateAndRecordLeafNo
(spark) branch branch-3.5 updated: [SPARK-48273][SQL][FOLLOWUP] Explicitly create non-Hive table in identifier-clause.sql
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 043944e1b549 [SPARK-48273][SQL][FOLLOWUP] Explicitly create non-Hive table in identifier-clause.sql 043944e1b549 is described below commit 043944e1b54902f6d8204a5610e8eb780f1fe753 Author: Wenchen Fan AuthorDate: Wed May 29 13:35:01 2024 -0700 [SPARK-48273][SQL][FOLLOWUP] Explicitly create non-Hive table in identifier-clause.sql ### What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/46580 . It's better to create non-Hive tables in the tests, so that it's backport safe, as old branches creates hive table by default. ### Why are the changes needed? fix branch-3.5 CI ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes #46794 from cloud-fan/test. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan (cherry picked from commit cf47293b5fc7c80d19e50fda44a01f91d5e34530) Signed-off-by: Wenchen Fan --- .../sql-tests/analyzer-results/identifier-clause.sql.out | 8 .../src/test/resources/sql-tests/inputs/identifier-clause.sql | 6 +++--- .../test/resources/sql-tests/results/identifier-clause.sql.out| 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/identifier-clause.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/identifier-clause.sql.out index 823ce43247a7..9b56a172e59d 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/identifier-clause.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/identifier-clause.sql.out @@ -687,7 +687,7 @@ org.apache.spark.sql.AnalysisException -- !query -CREATE TABLE IDENTIFIER(1)(c1 INT) +CREATE TABLE IDENTIFIER(1)(c1 INT) USING csv -- !query analysis org.apache.spark.sql.AnalysisException { @@ -709,7 +709,7 @@ org.apache.spark.sql.AnalysisException -- !query -CREATE TABLE IDENTIFIER('a.b.c')(c1 INT) +CREATE TABLE IDENTIFIER('a.b.c')(c1 INT) USING csv -- !query analysis org.apache.spark.sql.AnalysisException { @@ -902,7 +902,7 @@ CacheTableAsSelect t1, (select my_col from (values (1), (2), (1) as (my_col)) gr -- !query -create table identifier('t2') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1) +create table identifier('t2') using csv as (select my_col from (values (1), (2), (1) as (my_col)) group by 1) -- !query analysis CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`t2`, ErrorIfExists, [my_col] +- Aggregate [my_col#x], [my_col#x] @@ -914,7 +914,7 @@ CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`t2`, ErrorIfExis -- !query insert into identifier('t2') select my_col from (values (3) as (my_col)) group by 1 -- !query analysis -InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_dir}/t2, false, Parquet, [path=file:[not included in comparison]/{warehouse_dir}/t2], Append, `spark_catalog`.`default`.`t2`, org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included in comparison]/{warehouse_dir}/t2), [my_col] +InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_dir}/t2, false, CSV, [path=file:[not included in comparison]/{warehouse_dir}/t2], Append, `spark_catalog`.`default`.`t2`, org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included in comparison]/{warehouse_dir}/t2), [my_col] +- Aggregate [my_col#x], [my_col#x] +- SubqueryAlias __auto_generated_subquery_name +- SubqueryAlias as diff --git a/sql/core/src/test/resources/sql-tests/inputs/identifier-clause.sql b/sql/core/src/test/resources/sql-tests/inputs/identifier-clause.sql index 9e6314202b5f..e85fdf7b5da3 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/identifier-clause.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/identifier-clause.sql @@ -109,8 +109,8 @@ VALUES(IDENTIFIER(1)); VALUES(IDENTIFIER(SUBSTR('HELLO', 1, RAND() + 1))); SELECT `IDENTIFIER`('abs')(c1) FROM VALUES(-1) AS T(c1); -CREATE TABLE IDENTIFIER(1)(c1 INT); -CREATE TABLE IDENTIFIER('a.b.c')(c1 INT); +CREATE TABLE IDENTIFIER(1)(c1 INT) USING csv; +CREATE TABLE IDENTIFIER('a.b.c')(c1 INT) USING csv; CREATE VIEW IDENTIFIER('a.b.c')(c1) AS VALUES(1); DROP TABLE IDENTIFIER('a.b.c'); DROP VIEW IDENTIFIER('a.b.c'); @@ -125,7 +125,7 @@ CREATE TEMPORARY VIEW IDENTIFIER('default.v')(c1) AS VALUES(1); -- SPARK-48273: Aggregation operation in statements using identifier clause for table name create temporary view identifier('v1
(spark) branch master updated: [SPARK-48273][SQL][FOLLOWUP] Explicitly create non-Hive table in identifier-clause.sql
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cf47293b5fc7 [SPARK-48273][SQL][FOLLOWUP] Explicitly create non-Hive table in identifier-clause.sql cf47293b5fc7 is described below commit cf47293b5fc7c80d19e50fda44a01f91d5e34530 Author: Wenchen Fan AuthorDate: Wed May 29 13:35:01 2024 -0700 [SPARK-48273][SQL][FOLLOWUP] Explicitly create non-Hive table in identifier-clause.sql ### What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/46580 . It's better to create non-Hive tables in the tests, so that it's backport safe, as old branches creates hive table by default. ### Why are the changes needed? fix branch-3.5 CI ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes #46794 from cloud-fan/test. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- .../sql-tests/analyzer-results/identifier-clause.sql.out | 8 .../src/test/resources/sql-tests/inputs/identifier-clause.sql | 6 +++--- .../test/resources/sql-tests/results/identifier-clause.sql.out| 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/identifier-clause.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/identifier-clause.sql.out index f799c19a3bb8..b3e2cd5ada95 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/identifier-clause.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/identifier-clause.sql.out @@ -732,7 +732,7 @@ org.apache.spark.sql.AnalysisException -- !query -CREATE TABLE IDENTIFIER(1)(c1 INT) +CREATE TABLE IDENTIFIER(1)(c1 INT) USING csv -- !query analysis org.apache.spark.sql.AnalysisException { @@ -754,7 +754,7 @@ org.apache.spark.sql.AnalysisException -- !query -CREATE TABLE IDENTIFIER('a.b.c')(c1 INT) +CREATE TABLE IDENTIFIER('a.b.c')(c1 INT) USING csv -- !query analysis org.apache.spark.sql.AnalysisException { @@ -947,7 +947,7 @@ CacheTableAsSelect t1, (select my_col from (values (1), (2), (1) as (my_col)) gr -- !query -create table identifier('t2') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1) +create table identifier('t2') using csv as (select my_col from (values (1), (2), (1) as (my_col)) group by 1) -- !query analysis CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`t2`, ErrorIfExists, [my_col] +- Aggregate [my_col#x], [my_col#x] @@ -959,7 +959,7 @@ CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`t2`, ErrorIfExis -- !query insert into identifier('t2') select my_col from (values (3) as (my_col)) group by 1 -- !query analysis -InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_dir}/t2, false, Parquet, [path=file:[not included in comparison]/{warehouse_dir}/t2], Append, `spark_catalog`.`default`.`t2`, org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included in comparison]/{warehouse_dir}/t2), [my_col] +InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_dir}/t2, false, CSV, [path=file:[not included in comparison]/{warehouse_dir}/t2], Append, `spark_catalog`.`default`.`t2`, org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included in comparison]/{warehouse_dir}/t2), [my_col] +- Aggregate [my_col#x], [my_col#x] +- SubqueryAlias __auto_generated_subquery_name +- SubqueryAlias as diff --git a/sql/core/src/test/resources/sql-tests/inputs/identifier-clause.sql b/sql/core/src/test/resources/sql-tests/inputs/identifier-clause.sql index 978b82c331fe..46461dcd048e 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/identifier-clause.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/identifier-clause.sql @@ -119,8 +119,8 @@ VALUES(IDENTIFIER(1)); VALUES(IDENTIFIER(SUBSTR('HELLO', 1, RAND() + 1))); SELECT `IDENTIFIER`('abs')(c1) FROM VALUES(-1) AS T(c1); -CREATE TABLE IDENTIFIER(1)(c1 INT); -CREATE TABLE IDENTIFIER('a.b.c')(c1 INT); +CREATE TABLE IDENTIFIER(1)(c1 INT) USING csv; +CREATE TABLE IDENTIFIER('a.b.c')(c1 INT) USING csv; CREATE VIEW IDENTIFIER('a.b.c')(c1) AS VALUES(1); DROP TABLE IDENTIFIER('a.b.c'); DROP VIEW IDENTIFIER('a.b.c'); @@ -135,7 +135,7 @@ CREATE TEMPORARY VIEW IDENTIFIER('default.v')(c1) AS VALUES(1); -- SPARK-48273: Aggregation operation in statements using identifier clause for table name create temporary view identifier('v1') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1); cache table identifier('t1') as (select
(spark) branch master updated (0461745f1616 -> dc6b493dd1f4)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0461745f1616 [SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) add dc6b493dd1f4 [SPARK-48462][SQL][TESTS] Use withSQLConf in tests: Refactor HiveQuerySuite and HiveTableScanSuite No new revisions were added by this update. Summary of changes: .../spark/sql/hive/execution/HiveQuerySuite.scala | 111 +++-- .../sql/hive/execution/HiveTableScanSuite.scala| 18 ++-- 2 files changed, 67 insertions(+), 62 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0461745f1616 [SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) 0461745f1616 is described below commit 0461745f161692c7ad2bc0e418c4e5fb75f71ef5 Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Wed May 29 11:16:37 2024 -0700 [SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) ### What changes were proposed in this pull request? String searching in UTF8_BINARY_LCASE now works on character-level, rather than on byte-level. For example: `instr("İ", "i")`; now returns 0, because there exists no `start, len` such that `lowercase(substring("İ", start, len)) == "i"`. ### Why are the changes needed? Fix functions that give unusable results due to one-to-many case mapping when performing string search under UTF8_BINARY_LCASE (see example above). ### Does this PR introduce _any_ user-facing change? Yes, behaviour of `instr` and `substring_index` expressions is changed for edge cases with one-to-many case mapping. ### How was this patch tested? New unit tests in `CollationSupportSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46589 from uros-db/alter-lcase-vol2. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../catalyst/util/CollationAwareUTF8String.java| 48 ++--- .../spark/sql/catalyst/util/CollationSupport.java | 2 +- .../org/apache/spark/unsafe/types/UTF8String.java | 13 +- .../spark/unsafe/types/CollationSupportSuite.java | 50 +- 4 files changed, 75 insertions(+), 38 deletions(-) diff --git a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java index 0d0094d8d0a0..a6e96003ec34 100644 --- a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java +++ b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java @@ -345,14 +345,14 @@ public class CollationAwareUTF8String { */ public static int lowercaseIndexOf(final UTF8String target, final UTF8String pattern, final int start) { -if (pattern.numChars() == 0) return 0; +if (pattern.numChars() == 0) return target.indexOfEmpty(start); return lowercaseFind(target, pattern.toLowerCase(), start); } public static int indexOf(final UTF8String target, final UTF8String pattern, final int start, final int collationId) { if (pattern.numBytes() == 0) { - return 0; + return target.indexOfEmpty(start); } StringSearch stringSearch = CollationFactory.getStringSearch(target, pattern, collationId); @@ -444,47 +444,27 @@ public class CollationAwareUTF8String { return UTF8String.EMPTY_UTF8; } -UTF8String lowercaseString = string.toLowerCase(); UTF8String lowercaseDelimiter = delimiter.toLowerCase(); if (count > 0) { - int idx = -1; + // Search left to right (note: the start code point is inclusive). + int matchLength = -1; while (count > 0) { -idx = lowercaseString.find(lowercaseDelimiter, idx + 1); -if (idx >= 0) { - count--; -} else { - // can not find enough delim - return string; -} - } - if (idx == 0) { -return UTF8String.EMPTY_UTF8; +matchLength = lowercaseFind(string, lowercaseDelimiter, matchLength + 1); +if (matchLength > MATCH_NOT_FOUND) --count; // Found a delimiter. +else return string; // Cannot find enough delimiters in the string. } - byte[] bytes = new byte[idx]; - copyMemory(string.getBaseObject(), string.getBaseOffset(), bytes, BYTE_ARRAY_OFFSET, idx); - return UTF8String.fromBytes(bytes); - + return string.substring(0, matchLength); } else { - int idx = string.numBytes() - delimiter.numBytes() + 1; + // Search right to left (note: the end code point is exclusive). + int matchLength = string.numChars() + 1; count = -count; while (count > 0) { -idx = lowercaseString.rfind(lowercaseDelimiter, idx - 1); -if (idx >= 0) { - count--; -} else { - // can not find enough delim - return string; -} +matchLength = lowercaseRFind(string, lowercaseDelimiter, matchLength - 1); +if (matchLength > MATCH_NOT_FOUND) -
(spark) branch master updated: [SPARK-48444][SQL][TESTS] Use withSQLConf in tests: Refactor SQLQuerySuite
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 49204b10af58 [SPARK-48444][SQL][TESTS] Use withSQLConf in tests: Refactor SQLQuerySuite 49204b10af58 is described below commit 49204b10af58230af2e6d9104ad61fb81f6a0bc3 Author: Rui Wang AuthorDate: Wed May 29 10:38:33 2024 -0700 [SPARK-48444][SQL][TESTS] Use withSQLConf in tests: Refactor SQLQuerySuite ### What changes were proposed in this pull request? Use withSQLConf in tests when it is appropriate. ### Why are the changes needed? Enforce good practice for setting config in test cases. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Existing UT ### Was this patch authored or co-authored using generative AI tooling? NO Closes #46778 from amaliujia/test_case_with_sql_config. Authored-by: Rui Wang Signed-off-by: Wenchen Fan --- .../spark/sql/hive/execution/SQLQuerySuite.scala | 113 ++--- 1 file changed, 55 insertions(+), 58 deletions(-) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala index 0bcac639443c..05b73e31d115 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala @@ -178,24 +178,24 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi |PARTITIONED BY (state STRING, month INT) |STORED AS PARQUET """.stripMargin) +withSQLConf("hive.exec.dynamic.partition.mode" -> "nonstrict") { + sql("INSERT INTO TABLE orders PARTITION(state, month) SELECT * FROM orders1") + sql("INSERT INTO TABLE orderupdates PARTITION(state, month) SELECT * FROM orderupdates1") -sql("set hive.exec.dynamic.partition.mode=nonstrict") -sql("INSERT INTO TABLE orders PARTITION(state, month) SELECT * FROM orders1") -sql("INSERT INTO TABLE orderupdates PARTITION(state, month) SELECT * FROM orderupdates1") - -checkAnswer( - sql( -""" - |select orders.state, orders.month - |from orders - |join ( - | select distinct orders.state,orders.month - | from orders - | join orderupdates - |on orderupdates.id = orders.id) ao - | on ao.state = orders.state and ao.month = orders.month + checkAnswer( +sql( + """ +|select orders.state, orders.month +|from orders +|join ( +| select distinct orders.state,orders.month +| from orders +| join orderupdates +|on orderupdates.id = orders.id) ao +| on ao.state = orders.state and ao.month = orders.month """.stripMargin), - (1 to 6).map(_ => Row("CA", 20151))) +(1 to 6).map(_ => Row("CA", 20151))) +} } } } @@ -715,21 +715,23 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi } test("command substitution") { -sql("set tbl=src") -checkAnswer( - sql("SELECT key FROM ${hiveconf:tbl} ORDER BY key, value limit 1"), - sql("SELECT key FROM src ORDER BY key, value limit 1").collect().toSeq) +withSQLConf("tbl" -> "src") { + checkAnswer( +sql("SELECT key FROM ${hiveconf:tbl} ORDER BY key, value limit 1"), +sql("SELECT key FROM src ORDER BY key, value limit 1").collect().toSeq) +} -sql("set spark.sql.variable.substitute=false") // disable the substitution -sql("set tbl2=src") -intercept[Exception] { - sql("SELECT key FROM ${hiveconf:tbl2} ORDER BY key, value limit 1").collect() +withSQLConf("tbl2" -> "src", "spark.sql.variable.substitute" -> "false") { + intercept[Exception] { +sql("SELECT key FROM ${hiveconf:tbl2} ORDER BY key, value limit 1").collect() + } } -sql("set spark.sql.variable.substitute=true") // enable the substitution -checkAnswer( - sql("SELECT key FROM ${hiveconf:tbl2} ORDER BY key, value limit 1"), - sql("SELECT key FROM src ORDER BY key, value li
(spark) branch master updated (a86bca131028 -> e6236af3d08c)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a86bca131028 [SPARK-48445][SQL] Don't inline UDFs with expensive children add e6236af3d08c [SPARK-48000][SQL] Enable hash join support for all collations (StringType) No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/CollationFactory.java | 11 ++ .../catalyst/analysis/RewriteCollationJoin.scala | 45 ++ .../sql/catalyst/expressions/CollationKey.scala| 47 ++ .../expressions/CollationExpressionSuite.scala | 26 .../spark/sql/execution/SparkOptimizer.scala | 4 +- .../org/apache/spark/sql/CollationSuite.scala | 166 - 6 files changed, 264 insertions(+), 35 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationKey.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r69427 - in /dev/spark/v4.0.0-preview1-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/articles/sparkr-vignettes_files/ _site/api/R/articles/sparkr-vignettes_
Author: wenchen Date: Tue May 28 17:45:42 2024 New Revision: 69427 Log: Apache Spark v4.0.0-preview1-rc3 docs [This commit notification would consist of 4816 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 249390017ef4 [SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) 249390017ef4 is described below commit 249390017ef4a045037213dec386e16cca125080 Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Tue May 28 10:05:12 2024 -0700 [SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) ### What changes were proposed in this pull request? String searching in UTF8_BINARY_LCASE now works on character-level, rather than on byte-level. For example: `contains("İ", "i");` now returns **false**, because there exists no `start, len` such that `lowercase(substring("İ", start, len)) == "i"`. ### Why are the changes needed? Fix functions that give unusable results due to one-to-many case mapping when performing string search under UTF8_BINARY_LCASE (see example above). ### Does this PR introduce _any_ user-facing change? Yes, behaviour of `contains`, `startswith`, `endswith`, and `locate`/`position` expressions is changed for edge cases with one-to-many case mapping. ### How was this patch tested? New unit tests in `CollationSupportSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46511 from uros-db/alter-lcase-impl. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../catalyst/util/CollationAwareUTF8String.java| 169 + .../spark/sql/catalyst/util/CollationSupport.java | 8 +- .../org/apache/spark/unsafe/types/UTF8String.java | 118 -- .../spark/unsafe/types/CollationSupportSuite.java | 129 +--- .../apache/spark/unsafe/types/UTF8StringSuite.java | 105 - 5 files changed, 278 insertions(+), 251 deletions(-) diff --git a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java index ee0d611d7e65..0d0094d8d0a0 100644 --- a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java +++ b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java @@ -34,6 +34,155 @@ import java.util.Map; * Utility class for collation-aware UTF8String operations. */ public class CollationAwareUTF8String { + + /** + * The constant value to indicate that the match is not found when searching for a pattern + * string in a target string. + */ + private static final int MATCH_NOT_FOUND = -1; + + /** + * Returns whether the target string starts with the specified prefix, starting from the + * specified position (0-based index referring to character position in UTF8String), with respect + * to the UTF8_BINARY_LCASE collation. The method assumes that the prefix is already lowercased + * prior to method call to avoid the overhead of calling .toLowerCase() multiple times on the + * same prefix string. + * + * @param target the string to be searched in + * @param lowercasePattern the string to be searched for + * @param startPos the start position for searching (in the target string) + * @return whether the target string starts with the specified prefix in UTF8_BINARY_LCASE + */ + public static boolean lowercaseMatchFrom( + final UTF8String target, + final UTF8String lowercasePattern, + int startPos) { +return lowercaseMatchLengthFrom(target, lowercasePattern, startPos) != MATCH_NOT_FOUND; + } + + /** + * Returns the length of the substring of the target string that starts with the specified + * prefix, starting from the specified position (0-based index referring to character position + * in UTF8String), with respect to the UTF8_BINARY_LCASE collation. The method assumes that the + * prefix is already lowercased. The method only considers the part of target string that + * starts from the specified (inclusive) position (that is, the method does not look at UTF8 + * characters of the target string at or after position `endPos`). If the prefix is not found, + * MATCH_NOT_FOUND is returned. + * + * @param target the string to be searched in + * @param lowercasePattern the string to be searched for + * @param startPos the start position for searching (in the target string) + * @return length of the target substring that starts with the specified prefix in lowercase + */ + private static int lowercaseMatchLengthFrom( + final UTF8Str
(spark) branch master updated (731a2cfcffae -> e9a3ed857954)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 731a2cfcffae [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier add e9a3ed857954 [SPARK-48159][SQL] Extending support for collated strings on datetime expressions No new revisions were added by this update. Summary of changes: .../catalyst/expressions/datetimeExpressions.scala | 38 ++-- .../spark/sql/CollationSQLExpressionsSuite.scala | 234 + 2 files changed, 254 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 7313d71438e4 [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier 7313d71438e4 is described below commit 7313d71438e4691f7c086e90ded4a6f644cdcdc5 Author: Nikola Mandic AuthorDate: Tue May 28 09:59:53 2024 -0700 [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier ### What changes were proposed in this pull request? `PlanWithUnresolvedIdentifier` is rewritten later in analysis which causes rules like `SubstituteUnresolvedOrdinals` to miss the new plan. This causes following queries to fail: ``` create temporary view identifier('v1') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1); -- cache table identifier('t1') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1); -- create table identifier('t2') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1); insert into identifier('t2') select my_col from (values (3) as (my_col)) group by 1; ``` Fix this by explicitly applying rules after plan rewrite. ### Why are the changes needed? To fix the described bug. ### Does this PR introduce _any_ user-facing change? Yes, it fixes the mentioned problematic queries. ### How was this patch tested? Updated existing `identifier-clause.sql` golden file. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46580 from nikolamand-db/SPARK-48273. Authored-by: Nikola Mandic Signed-off-by: Wenchen Fan (cherry picked from commit 731a2cfcffaeeeb1f1c107080ca77000330d79b5) Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/analysis/Analyzer.scala | 9 ++-- .../analysis/ResolveIdentifierClause.scala | 11 ++-- .../spark/sql/catalyst/rules/RuleExecutor.scala| 2 +- .../analyzer-results/identifier-clause.sql.out | 59 ++ .../sql-tests/inputs/identifier-clause.sql | 9 .../sql-tests/results/identifier-clause.sql.out| 56 6 files changed, 139 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index ed7b978045c7..5890a9692e20 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -255,7 +255,7 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor TypeCoercion.typeCoercionRules } - override def batches: Seq[Batch] = Seq( + private def earlyBatches: Seq[Batch] = Seq( Batch("Substitution", fixedPoint, // This rule optimizes `UpdateFields` expression chains so looks more like optimization rule. // However, when manipulating deeply nested schema, `UpdateFields` expression tree could be @@ -275,7 +275,10 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor Batch("Simple Sanity Check", Once, LookupFunctions), Batch("Keep Legacy Outputs", Once, - KeepLegacyOutputs), + KeepLegacyOutputs) + ) + + override def batches: Seq[Batch] = earlyBatches ++ Seq( Batch("Resolution", fixedPoint, new ResolveCatalogs(catalogManager) :: ResolveInsertInto :: @@ -319,7 +322,7 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor ResolveTimeZone :: ResolveRandomSeed :: ResolveBinaryArithmetic :: - ResolveIdentifierClause :: + new ResolveIdentifierClause(earlyBatches) :: ResolveUnion :: ResolveRowLevelCommandAssignments :: RewriteDeleteFromTable :: diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala index e0d3e5629ef8..422bad3d89e2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala @@ -20,19 +20,24 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.{AliasHelper, EvalHelper, Expression} import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan -import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.cat
(spark) branch master updated (7fe1b93884aa -> 731a2cfcffae)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7fe1b93884aa [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers add 731a2cfcffae [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 9 ++-- .../analysis/ResolveIdentifierClause.scala | 11 ++-- .../spark/sql/catalyst/rules/RuleExecutor.scala| 2 +- .../analyzer-results/identifier-clause.sql.out | 59 ++ .../sql-tests/inputs/identifier-clause.sql | 9 .../sql-tests/results/identifier-clause.sql.out| 56 6 files changed, 139 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7fe1b93884aa [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers 7fe1b93884aa is described below commit 7fe1b93884aa8e9ba20f19351b8537c687b8f59c Author: Nikola Mandic AuthorDate: Tue May 28 09:56:16 2024 -0700 [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers ### What changes were proposed in this pull request? Languages and localization for collations are supported by ICU library. Collation naming format is as follows: ``` <2-letter language code>[_<4-letter script>][_<3-letter country code>][_specifier_specifier...] ``` Locale specifier consists of the first part of collation name (language + script + country). Locale specifiers need to be stable across ICU versions; to keep existing ids and names invariant we introduce golden file will locale table which should case CI failure on any silent changes. Currently supported optional specifiers: - `CS`/`CI` - case sensitivity, default is case-sensitive; supported by configuring ICU collation levels - `AS`/`AI` - accent sensitivity, default is accent-sensitive; supported by configuring ICU collation levels User can use collation specifiers in any order except of locale which is mandatory and must go first. There is a one-to-one mapping between collation ids and collation names defined in `CollationFactory`. ### Why are the changes needed? To add languages and localization support for collations. ### Does this PR introduce _any_ user-facing change? Yes, it adds new predefined collations. ### How was this patch tested? Added checks to `CollationFactorySuite` and ICU locale map golden file. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46180 from nikolamand-db/SPARK-46841. Authored-by: Nikola Mandic Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/util/CollationFactory.java | 678 + .../spark/unsafe/types/CollationFactorySuite.scala | 323 +- .../src/main/resources/error/error-conditions.json | 4 +- .../apache/spark/sql/PlanGenerationTestSuite.scala | 4 +- .../src/main/protobuf/spark/connect/types.proto| 2 +- .../connect/common/DataTypeProtoConverter.scala| 9 +- .../query-tests/queries/csv_from_dataset.json | 2 +- .../query-tests/queries/csv_from_dataset.proto.bin | Bin 158 -> 169 bytes .../query-tests/queries/function_lit_array.json| 4 +- .../queries/function_lit_array.proto.bin | Bin 889 -> 911 bytes .../query-tests/queries/function_typedLit.json | 32 +- .../queries/function_typedLit.proto.bin| Bin 1199 -> 1381 bytes .../query-tests/queries/json_from_dataset.json | 2 +- .../queries/json_from_dataset.proto.bin| Bin 169 -> 180 bytes python/pyspark/sql/connect/proto/types_pb2.py | 78 +-- python/pyspark/sql/connect/proto/types_pb2.pyi | 11 +- python/pyspark/sql/connect/types.py| 5 +- python/pyspark/sql/types.py| 27 +- .../org/apache/spark/sql/internal/SQLConf.scala| 15 +- .../expressions/CollationExpressionSuite.scala | 33 +- .../resources/collations/ICU-collations-map.md | 143 + .../sql-tests/analyzer-results/collations.sql.out | 77 +++ .../test/resources/sql-tests/inputs/collations.sql | 13 + .../resources/sql-tests/results/collations.sql.out | 88 +++ .../org/apache/spark/sql/CollationSuite.scala | 2 +- .../apache/spark/sql/ICUCollationsMapSuite.scala | 69 +++ .../apache/spark/sql/internal/SQLConfSuite.scala | 3 +- 27 files changed, 1388 insertions(+), 236 deletions(-) diff --git a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java index 0133c3feb611..fce12510afaf 100644 --- a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java +++ b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java @@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.util; import java.text.CharacterIterator; import java.text.StringCharacterIterator; import java.util.*; +import java.util.concurrent.ConcurrentHashMap; import java.util.function.BiFunction; import java.util.function.ToLongFunction; @@ -173,26 +174,546 @@ public final class CollationFactory { } /** - * Constructor with comparators that are inherited from the given collator. + * Collation ID is defined as 32-bit integer. We specify binary
svn commit: r69426 - /dev/spark/v4.0.0-preview1-rc3-bin/
Author: wenchen Date: Tue May 28 16:50:57 2024 New Revision: 69426 Log: Apache Spark v4.0.0-preview1-rc3 Added: dev/spark/v4.0.0-preview1-rc3-bin/ dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz.asc dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc3-bin/pyspark-4.0.0.dev1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc3-bin/pyspark-4.0.0.dev1.tar.gz.asc dev/spark/v4.0.0-preview1-rc3-bin/pyspark-4.0.0.dev1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc3-bin/pyspark_connect-4.0.0.dev1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc3-bin/pyspark_connect-4.0.0.dev1.tar.gz.asc dev/spark/v4.0.0-preview1-rc3-bin/pyspark_connect-4.0.0.dev1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc3-bin/spark-4.0.0-preview1-bin-hadoop3.tgz (with props) dev/spark/v4.0.0-preview1-rc3-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.asc dev/spark/v4.0.0-preview1-rc3-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.sha512 dev/spark/v4.0.0-preview1-rc3-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz (with props) dev/spark/v4.0.0-preview1-rc3-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.asc dev/spark/v4.0.0-preview1-rc3-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.sha512 dev/spark/v4.0.0-preview1-rc3-bin/spark-4.0.0-preview1.tgz (with props) dev/spark/v4.0.0-preview1-rc3-bin/spark-4.0.0-preview1.tgz.asc dev/spark/v4.0.0-preview1-rc3-bin/spark-4.0.0-preview1.tgz.sha512 Added: dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz.asc Tue May 28 16:50:57 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmZWCNcTHHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/WqwvD/4/Lap7M5blsZvzAmevVyZ58wIESquP +8BNt/2SVuj1gQJizxsqyTk+knyFSQ/NPPlasHc6G/yd5aUAAaggVO1S4QCtKvtQ9 +E9EQQ8BFLjC11Srg/93dDdMuLXsB+SNiYoT9yILtCK9Hs2M84i3aXlla3jPYs1qZ +/E/a5JmqMhxaBeNk2L4uo6KqevanH5d2Xi9Xe8ulln2xJqJARSVVSOr3qO0BdZjb +wv7xyDo7wRW96dQywx5gHPuZIL6Qu0bYqRRQAaQZvwmeJnxah9jLZZKWp6E1eLCq +jD11l+FMauIzyO1B3BK9opsBze8G0mVTuUPFYww5C8DxfxwSDBzUZaGHlp1xmxiv +lF35PmB/FpRk9ddpzNucJnWddjS582wj+rxi3KnlFIusbTtDFpRFa+5sTa0GG2LO +wG5vBD2QHSWHQ3NnvGiffp6OIPOmw009+QNi7/JYfVrpsNHRqW5bBew3QeR756Jy +tFvOCN37wLzLwfEOGDou3lNyYFBlsFk37HqlnQpkmvokPzBJ2giWmwVnIc7iYub5 +DHtB86r/Vmqb1mkqsG9PbsBIzbRX6e1rTAQtQQbBYenaA63rAVwrLFt65Y2rTIt8 +D8ewS9cLhEJaf6ajndb5AlQRxX/hth5xmuSMEXib0V5V/BGgNtw9kQ6GouPmf16J +AOs0h20YWzkFmw== +=aEv0 +-END PGP SIGNATURE- Added: dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz.sha512 == --- dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz.sha512 (added) +++ dev/spark/v4.0.0-preview1-rc3-bin/SparkR_4.0.0-preview1.tar.gz.sha512 Tue May 28 16:50:57 2024 @@ -0,0 +1 @@ +b2a81b7239d39b2af3a81a82fa8541db8551a7503a602766e37bdaf70495123e2d3fa68cd4b684af2df2386f0212167a291cbc260d54ac985fd968dc09b3a0d2 SparkR_4.0.0-preview1.tar.gz Added: dev/spark/v4.0.0-preview1-rc3-bin/pyspark-4.0.0.dev1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc3-bin/pyspark-4.0.0.dev1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc3-bin/pyspark-4.0.0.dev1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc3-bin/pyspark-4.0.0.dev1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc3-bin/pyspark-4.0.0.dev1.tar.gz.asc Tue May 28 16:50:57 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmZWCNkTHHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/Wox1EAC7Eh7Yxy6rmYBTdG5AptbPLt7RyZei +gjl/8ROpgQ1p7ehghXuLtEkcuy/GSSU02BLsrbkM/QtDwElRBkEDVABcQyS7Jgia +2WFuK8E1BPPWlIl07KhqEmWwXSSzLRuVAQhMPFjT5g7Op/viqOCXbXSEGoHe8+8Z +4OJ9zr8qpeMM9ZLivQppq5PAodcKohR7n5BBHFjShNhU3XJ3Cl3pFMxg9weCCuGD +2SQgPIveai7P9Lhe5Cl5eXiSOCEG+r4QJjk9d5FjAH+VK0qcH0guW41eeHiv3k1y +DFeh3PJlvUx1TP8/E7hiMUVA5H5HorHkzOraQrFaC+D+tqMWAQFSXThrJmYSRaEU +h2SFOdQ8Bk4AsAzikzyALULT+gDKhGhtFWLpz5eyt2tWOKL8sCpcF0AnrrssusJp +5p+9xhBvs9L
(spark) 01/01: Preparing Spark release v4.0.0-preview1-rc3
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to tag v4.0.0-preview1-rc3 in repository https://gitbox.apache.org/repos/asf/spark.git commit 7a7a8bc4bab591ac8b98b2630b38c57adf619b82 Author: Wenchen Fan AuthorDate: Tue May 28 16:23:00 2024 + Preparing Spark release v4.0.0-preview1-rc3 --- R/pkg/R/sparkR.R | 4 ++-- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- common/utils/pom.xml | 2 +- common/variant/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/profiler/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/yarn/pom.xml | 2 +- sql/api/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 46 files changed, 49 insertions(+), 49 deletions(-) diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R index 0be7e5da24d2..478acf514ef3 100644 --- a/R/pkg/R/sparkR.R +++ b/R/pkg/R/sparkR.R @@ -456,8 +456,8 @@ sparkR.session <- function( # Check if version number of SparkSession matches version number of SparkR package jvmVersion <- callJMethod(sparkSession, "version") - # Remove -SNAPSHOT from jvm versions - jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion, fixed = TRUE) + # Remove -preview1 from jvm versions + jvmVersionStrip <- gsub("-preview1", "", jvmVersion, fixed = TRUE) rPackageVersion <- paste0(packageVersion("SparkR")) if (jvmVersionStrip != rPackageVersion) { diff --git a/assembly/pom.xml b/assembly/pom.xml index 58e7ae5bb0c7..417e7c23ca9f 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview1 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 046648e9c2ae..e1a4497387a2 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview1 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index cdb5bd72158a..d8dff6996cec 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview1 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 0f7036ef
(spark) tag v4.0.0-preview1-rc3 created (now 7a7a8bc4bab5)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to tag v4.0.0-preview1-rc3 in repository https://gitbox.apache.org/repos/asf/spark.git at 7a7a8bc4bab5 (commit) This tag includes the following new commits: new 7a7a8bc4bab5 Preparing Spark release v4.0.0-preview1-rc3 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r69425 - /dev/spark/KEYS
Author: wenchen Date: Tue May 28 16:09:45 2024 New Revision: 69425 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Tue May 28 16:09:45 2024 @@ -704,62 +704,61 @@ kyHyHY5kPG9HfDOSahPz =SDAz -END PGP PUBLIC KEY BLOCK- -pub rsa4096 2024-05-07 [SC] - 4DC9676CEF9A83E98FCA02784D6620843CD87F5A -uid Wenchen Fan (CODE SIGNING KEY) -sub rsa4096 2024-05-07 [E] +pub 4096R/4F4FDC8A 2018-09-18 +uid Wenchen Fan (CODE SIGNING KEY) +sub 4096R/6F3F5B0E 2018-09-18 -BEGIN PGP PUBLIC KEY BLOCK- +Version: GnuPG v1 -mQINBGY6XpcBEADBeNz3IBYriwrPzMYJJO5u1DaWAJ4Sryx6PUZgvssrcqojYVTh -MjtlBkWRcNquAyDrVlU1vtq1yMq5KopQoAEi/l3xaEDZZ0IFAob6+GlGXEon2Jvf -0FXQsx+Df4nMVl7KPqh68T++Z4GkvK5wyyN9uaUTWL2deGeinVxTh6qWQT8YiCd5 -wof+Dk5IIzKQ5VIBhU/U9S0jo/pqhH4okcZGTyT2Q7sfg4eXl5+Y2OR334RkvTcX -uJjcnJ8BUbBSm1UhNg4OGBEJgi+lE1GEgw4juOfTAPh9fx8SCLhuX0m6Qc/y9bAK -Q4zejbF5F2Um9dqrZqg6Egp+nlzydn59hq9owSnQ6JdoA/PLcgoign0sghu9xGCR -GpgI2kS7Q8bu6dy7T0BfUerLZ1FHu7nCT2ZNSIh/Y2eOhuBhUr3llg8xa3PZZob/ -2sZE2dJ3g/qp2Nbo+s5Q5kELtuo6cZD0EISQwt68hGWIgxs0vtci2c2kQYFS0oqw -fGynEeDFZRHV3ET5rioYaoPi70Cnibght5ocL0t6sl0RQQVp6k2i1aofJbZA480N -ivuJ5agGaSRxmIDk6JlDsHJGxO9oC066ZLJiR6i0JUinGP7sw/nNmgup/AB+y4hW -9WdeAFyYmuYysDRRyE6z1MPDp1R00MyGxHNFDF64/JPY/nKKFdXp+aCazwARAQAB +mQINBFugiYgBEAC4DsJBWF3VjWiKEiD8XNPRTg3Bnw52fe4bTB9Jvh/q0VStJjO7 +CSHZ1/P5h60zbS5UWLP2mt+c0FaW6wv7PxafCnd1MPENGBkttZbC4UjWDSbPp0vx +fkUfrAqflWvO1AaCveg2MlyQdLZ1HwVz+PDLWqE+Ev2p3Si4Jfx5P2O9FmWt8a/b +Wea/4gfy/5zFWRberQjt4CkSBuNU+cOo19/n32JJJYbRqrzFAGs/DJUIxNXC1qef +c2iB3dyff1mkLb9Vzd1RfhZaSNUElo67o4Vi6SswgvHxoE03wIcoJvBTafqLxy6p +mt5SAzOyvvmOVcLNqP9i5+c4sBrxvQ2ZEZrZt7dKfhbh4W8ged/TNWMoNOCX2usD +Fj17KrFAEaeqtEwRdwZMxGqKI/NxANkdPSxS4T/JQoi+N6LBJ88yzmeCquA8MT0b +/H4ziyjgrSRugCE6jcsbuObQsDxiqPSSXeWSjPoYq876JcqAgZzSYYdlGVw2J9Vb +46hhEqhGk+91vK6CtyuhKv5KXk1B3Rhhc5znKWcahD3cpISxwTSzN9OwQHEd8Ovv +x0WAhY3WOexrBekH7Sy00gjaHSAHFj3ReITfffWkv6t4TGLyohEOfgdxFvq03Fhd +p7bWDmux47jP6AUUjP0VXRsG9ev3ch+bbcbRlo15HPBtyehoPn4BellFAQARAQAB tDNXZW5jaGVuIEZhbiAoQ09ERSBTSUdOSU5HIEtFWSkgPHdlbmNoZW5AYXBhY2hl -Lm9yZz6JAlEEEwEIADsWIQRNyWds75qD6Y/KAnhNZiCEPNh/WgUCZjpelwIbAwUL -CQgHAgIiAgYVCgkICwIEFgIDAQIeBwIXgAAKCRBNZiCEPNh/WkofD/9sI7J3i9Ck -NOlHpVnjAaHjyGX5cVA2dZGniJdLf5yOKOI6pu7dMW+NThsXO1Iv+BRYo7una6/Q -vUquKKxCXIN3vNmKIB1e9lj4MaIhCRmXUSQxjkVa9JW3P/F520Ct3VjiCZ5IjPv+ -g1hF/wrkuuoAFlcC/bfGWafkaZgszavSpCdp27mUXUNbvLW0dPJ3+ay4cDPuT1DI -6DhB8qpqN7gInDFACW2qtQ2KZh1JFGy5ZccQ9dB3t/B4BYiUie6a3eQWgKqLF1hw -8yHY3DkCVGfnXJk4+LMWqgazQxoB6oZjBvoQYtGOPXr1ZbmtiRHCDM5KmZ+QmIXB -ZGBXkLaqt2QGxlwUGlvn+nKuTsp8VL1APIlKdMpvMW59uz1ycZHMeTJGAMtZw8Qm -kxG62kqnDYeZ6oWwinY3wYP4UmqFSWIfcHMfBwED4uOC//r9H1bO+JRFMwOxqSN7 -kGfFJoV5eOvMOwRnXPJiPpnQEHPEkp/TAl2ANHWzdXy9TifiHOvTln3NXQVpznnW -H6f9+W36J1IE9EWktciptKUtvwY1np+G71Swa0Q4mNgb8OGf6UNJGv4vPbSlhzlr -1a5oYP59eHO3XqANcuKyTFxfja+rgrMldufZFCk1hSnBdAic/jaHrhIQSLcTGFiJ -QVyiC2VlO2eZCkCTfoSlolwgzzoY4wNumLkCDQRmOl6XARAAt+N+djFZOuJdLcSz -pz6nG88gxLmPwf+Xlhv2+xDS3wyM1OWmDAkeMDNq8OuZMes6ZXwRxDvSj7w7dlE6 -dQ1BlDz4RP4GoYG++dnPlHp/NWQ8I/eW8XC5uxkvl56YG/0DudoTLb5nxHtv+kpm -p+eVCqWRYI5RQPdcxEZzXEije+aEj2aMRQ8cO7RAgTamRWXr+XsRkSypZ8ttTISr -u+UuQPKT6XRMtkB2i8ekwO+jIK/mMrAteIF/cK0jv2JTlYmWrBtmGgYjHZHlzZak -/MzWN4tU5VbJMMXa9wHicZS0/cPV9Fz3dnR0sBVgaIDsK+/vRGxHd/LGFtXH+Wrp -pPMaR4FHCx3r44aL17B5lJocwf7Xma2gavOl80NR+a8iOW6biKdlALRZKX4G4cJj -1vnWHDJceZOuFWMVIs7zfJymvQpROCRED3q1el+zCICnLtBue6ikqv7nfyBNCaR2 -qZhw4TPMzzGTRIdKIalcSTi+bGfSYTsU2kVDBbH+0nD5I7Tx62H4shsJtgmwyP4R -q2dxJPpC4i+L09crjyl7rYvwHu4QU8vxcQXN4cH4O5pKOr2GoGnV8Y7kpZaRUo6w -/Q/Rx3I3UKAyYJv0R1mK4AifM0JzMkqxAUvUdUbs2obRT04sxtr1bA+9dLEv4b8c -YGKmRgt96GCNx1XZ8Q+FPdmsaO0AEQEAAYkCNgQYAQgAIBYhBE3JZ2zvmoPpj8oC -eE1mIIQ82H9aBQJmOl6XAhsMAAoJEE1mIIQ82H9aBfAQAKf6xHNuKibXcRMwqmcx -rx18d0dbeMEjrPqSe5vGOylLQZRpwZmKwflU9kZgOU2WRuqZsaPE0w5wxhsNDe8s -UqxW08xB6v8BVj6BT9umJQNyQF5CrsjkZe2EtmYlbdNmt4t8DMNEmhhasEglWUui -0se3I0wIwDaYAW+KppwzweO8SrUZVaB6QhOckRFhz/1wCNyc2Yp90OjWjuATffOE -ZWSeGPn9GCbtJ+SPtLtMUlxy/BoRA6OWv6H5VAt6pJVw3XPP/o450i7lYxbmbv8W -qm5/8nWx1XBvTvOxGoT9h+45bWjLTXtJJ2RhEftGHZ9439VSgssXBl+S/yjpnHOa -14tRCVABP8bgAQ7HEKZ9YyII6MOAEzNa2gNVKr7+gwB1ddrGdzx6TrIUwRlgilDJ -XORdEON4Ssx31Y1+Dt+d4lkkGu5Ymkj8iFIeH6FNOnFWM/stTmL0fE4IGpWbUHc+ -nqz7zEgili8TanLQRUmz9ClVJTG4G9t31FYF8nNzDPxug9oSMJXBfVlzhRMRZH3z -t/XdxNFHyu7rzXidiXTJSmujeqS++mKcXxx02m+V2qfwkAwnt6OS9NDLPVrzuuMN -NDfY3Gr4dTCbd+JQxtC0w4GuUV1V3lcOwyEjPKJVYuZwUl0UspRbNmtsaybRbzVs -+q68az33WU5++zSuqrU3fIRp -=1zLb +Lm9yZz6JAjgEEwECACIFAlugiYgCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheA +AAoJEGuscolPT9yKvqUP/i34exSQNs9NcDvjOQhrvpjYCarL4mdQZOjIn9JWxeWr +3nkzC9ozEIrb1zt8pqhiYr6qJhmx2EJgIwZTZZ9O0qHFMmYhYn/9/KKidE0XN6t3 +dFcbtRB1PGlc9b34PZNfdhD8PWA/UB1QC0TdTRNKhrIGGIZocrkaBral6uMJZAyV +kbb+s21cRupPLM2wmU1k3U4WxnaIq2foErhaPC9+OEDAcLH/OxwiekJTCsvZypzE +1laxo21rX1kgYzeAuqP4BfX5ARyrfM3O31Gh8asrx1bXD4z7dHqJxdJjh7ycdJdT
svn commit: r69418 - /dev/spark/v4.0.0-preview1-rc2-bin/
Author: wenchen Date: Tue May 28 07:41:08 2024 New Revision: 69418 Log: Apache Spark v4.0.0-preview1-rc2 Added: dev/spark/v4.0.0-preview1-rc2-bin/ dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.asc dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.asc dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc2-bin/pyspark_connect-4.0.0.dev1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc2-bin/pyspark_connect-4.0.0.dev1.tar.gz.asc dev/spark/v4.0.0-preview1-rc2-bin/pyspark_connect-4.0.0.dev1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-hadoop3.tgz (with props) dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.asc dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.sha512 dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz (with props) dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.asc dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.sha512 dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1.tgz (with props) dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1.tgz.asc dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1.tgz.sha512 Added: dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.asc Tue May 28 07:41:08 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmZVh/cTHHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/WtsEEACuWS2ZcvRneZIO8kTM0aH87vywbfEi +bZYIr47b1LhcTgKRXsJ9qjfESgy7dCw78ykdSTOIw7kSec60kXrlqsAgNEKtxDpg +B6yUD4+JSve9d9jQOCrwnYbUex+TzkIvWieDaU8DuuaAKf8gq1NMku2w26WBC11N +lNtEBy93rKuT51si5L5RK7Of58J9s5z8T0b1t/zXO9M+N7C+eDJly6EQ4+6STuYN +2q8+dne9l/tlthgQ30+YdOprU6ZRIwGukXRn830ZOOtfifF+ud7DVmk59dqmPzyX ++JiZuuVC56M19kpXt4hyg6cmOdG5wYoMZYApPueZCNUX+D4LC4pXkrI+4d90UnzL +jlQDD92ChhrWFCUSCg1ysjFH20QXgfiqoLMHBJJ3jWZGJfAhvxBOW7Y9wLND68HI +rFTxld/RkHFouwssasgxTL00mlWRZOXWdm/iByZS3J2U/bQgk4TbEqyHlCvKPUuK +0UaHNVpO+jUjJ8uTCKnk9JgZTKTPGNx3nFtNdE/vckIKOuZkhYVq9jvIUBoRsoCb +Rh/X5+aHUHZJT7faNOBVeNLgAGugIf8t/K3GysJSXxnXEBjDX94b5ruy8Mp5Odja +OAAr4U/RpQQvoGLnoc0ZAYok8V5RQW7Vy7Q8Tf+0RnPis4VIWB0XeA4Ts5QLfyd9 +nb/DxsuDsKofxw== +=t0/V +-END PGP SIGNATURE- Added: dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.sha512 == --- dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.sha512 (added) +++ dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.sha512 Tue May 28 07:41:08 2024 @@ -0,0 +1 @@ +2df825b17df1103bc368a8c382e1a8accfb82163b58adfd56026b528d35af21c93342b243658d4ecee50300380dea2755ab3f7eb5a4296d84089f392c62a8440 SparkR_4.0.0-preview1.tar.gz Added: dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.asc Tue May 28 07:41:08 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmZVh/kTHHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/WiYdD/9hTv6p373FBLLbZ2EMXaKvxNUE+CxW +pM/25bKRUVhb8bn/jE6rK6PUEGMk16p2rULbS5Ml2/KFCv31U3MYMyxXrKh3xMXt +Z+DDKcKv8tBjDaGxrGHBY9ob3ODU3Vng24HGtXKlAkesAbcbQfYsaVwI8Djl0tHT +bcJ48rXV+aoQUUpRq5TrPoKN9BOv5GL+GVPjxFXysejsnwmz2vusNYDBV2hScrAA +H2kwshbhX95zxxDQfP2jzZcEM/gFBHGYL9vbfS5yRpjjARP5LAJRFZRU9KL3evTa +g17B09/m5ED2OJdDgDrx+caZqIau8RnQYB1l723iO+BM7zJkW5qHHRsoMaf10Vvi +rDQrtIRE/YSEVmtWJYIuwLY2beloLFdUm1/4GwCMqkV+YpNEsBKqGsm31aqeP28Y +1w6sPQZXbo9
svn commit: r69417 - /dev/spark/v4.0.0-preview1-rc2-bin/
Author: wenchen Date: Tue May 28 06:35:58 2024 New Revision: 69417 Log: Deleting Removed: dev/spark/v4.0.0-preview1-rc2-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-41049][SQL][FOLLOW-UP] Mark map related expressions as stateful expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new f42c029fac5c [SPARK-41049][SQL][FOLLOW-UP] Mark map related expressions as stateful expressions f42c029fac5c is described below commit f42c029fac5c8015d80ad957fae325243a2ed30d Author: Rui Wang AuthorDate: Mon May 27 22:40:13 2024 -0700 [SPARK-41049][SQL][FOLLOW-UP] Mark map related expressions as stateful expressions MapConcat contains a state so it is stateful: ``` private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) ``` Similarly `MapFromEntries, CreateMap, MapFromArrays, StringToMap, and TransformKeys` need the same change. Stateful expression should be marked as stateful. No N/A No Closes #46721 from amaliujia/statefulexpr. Authored-by: Rui Wang Signed-off-by: Wenchen Fan (cherry picked from commit af1ac1edc2a96c9aba949e3100ddae37b6f0e5b2) Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/expressions/collectionOperations.scala | 3 +++ .../spark/sql/catalyst/expressions/complexTypeCreator.scala| 6 ++ .../spark/sql/catalyst/expressions/higherOrderFunctions.scala | 2 ++ .../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 10 +- 4 files changed, 20 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 3ddbe38fdedf..45896382af67 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -712,6 +712,7 @@ case class MapConcat(children: Seq[Expression]) } } + override def stateful: Boolean = true override def nullable: Boolean = children.exists(_.nullable) private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) @@ -827,6 +828,8 @@ case class MapFromEntries(child: Expression) override def nullable: Boolean = child.nullable || nullEntries + override def stateful: Boolean = true + @transient override lazy val dataType: MapType = dataTypeDetails.get._1 override def checkInputDataTypes(): TypeCheckResult = dataTypeDetails match { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala index c95a0987330d..1b6f86984be7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala @@ -242,6 +242,8 @@ case class CreateMap(children: Seq[Expression], useStringTypeWhenEmpty: Boolean) private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) + override def stateful: Boolean = true + override def eval(input: InternalRow): Any = { var i = 0 while (i < keys.length) { @@ -317,6 +319,8 @@ case class MapFromArrays(left: Expression, right: Expression) valueContainsNull = right.dataType.asInstanceOf[ArrayType].containsNull) } + override def stateful: Boolean = true + private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) override def nullSafeEval(keyArray: Any, valueArray: Any): Any = { @@ -563,6 +567,8 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E this(child, Literal(","), Literal(":")) } + override def stateful: Boolean = true + override def first: Expression = text override def second: Expression = pairDelim override def third: Expression = keyValueDelim diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala index fec1df108bcc..5b10b401af98 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala @@ -918,6 +918,8 @@ case class TransformKeys( override def dataType: MapType = MapType(function.dataType, valueType, valueContainsNull) + override def stateful: Boolean = true + override def checkInputDataTypes(): TypeCheckResult = { TypeUtils.checkForMapKeyType(function.dataType) } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuit
(spark) branch master updated: [SPARK-41049][SQL][FOLLOW-UP] Mark map related expressions as stateful expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new af1ac1edc2a9 [SPARK-41049][SQL][FOLLOW-UP] Mark map related expressions as stateful expressions af1ac1edc2a9 is described below commit af1ac1edc2a96c9aba949e3100ddae37b6f0e5b2 Author: Rui Wang AuthorDate: Mon May 27 22:40:13 2024 -0700 [SPARK-41049][SQL][FOLLOW-UP] Mark map related expressions as stateful expressions ### What changes were proposed in this pull request? MapConcat contains a state so it is stateful: ``` private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) ``` Similarly `MapFromEntries, CreateMap, MapFromArrays, StringToMap, and TransformKeys` need the same change. ### Why are the changes needed? Stateful expression should be marked as stateful. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #46721 from amaliujia/statefulexpr. Authored-by: Rui Wang Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/expressions/collectionOperations.scala | 3 +++ .../spark/sql/catalyst/expressions/complexTypeCreator.scala| 6 ++ .../spark/sql/catalyst/expressions/higherOrderFunctions.scala | 2 ++ .../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 10 +- 4 files changed, 20 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 632e2f3d3e97..ea117f876550 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -713,6 +713,7 @@ case class MapConcat(children: Seq[Expression]) } } + override def stateful: Boolean = true override def nullable: Boolean = children.exists(_.nullable) private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) @@ -828,6 +829,8 @@ case class MapFromEntries(child: Expression) override def nullable: Boolean = child.nullable || nullEntries + override def stateful: Boolean = true + @transient override lazy val dataType: MapType = dataTypeDetails.get._1 override def checkInputDataTypes(): TypeCheckResult = dataTypeDetails match { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala index 4c0d00534060..167c02c0bafc 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala @@ -245,6 +245,8 @@ case class CreateMap(children: Seq[Expression], useStringTypeWhenEmpty: Boolean) private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) + override def stateful: Boolean = true + override def eval(input: InternalRow): Any = { var i = 0 while (i < keys.length) { @@ -320,6 +322,8 @@ case class MapFromArrays(left: Expression, right: Expression) valueContainsNull = right.dataType.asInstanceOf[ArrayType].containsNull) } + override def stateful: Boolean = true + private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) override def nullSafeEval(keyArray: Any, valueArray: Any): Any = { @@ -568,6 +572,8 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E this(child, Literal(","), Literal(":")) } + override def stateful: Boolean = true + override def first: Expression = text override def second: Expression = pairDelim override def third: Expression = keyValueDelim diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala index 896f3e9774f3..80bcf156133e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala @@ -920,6 +920,8 @@ case class TransformKeys( override def dataType: MapType = MapType(function.dataType, valueType, valueContainsNull) + override def stateful: Boolean = true + override
svn commit: r69416 - in /dev/spark/v4.0.0-preview1-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/articles/sparkr-vignettes_files/ _site/api/R/articles/sparkr-vignettes_
Author: wenchen Date: Tue May 28 05:29:59 2024 New Revision: 69416 Log: Apache Spark v4.0.0-preview1-rc2 docs [This commit notification would consist of 4816 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r69415 - /dev/spark/v4.0.0-preview1-rc2-bin/
Author: wenchen Date: Tue May 28 04:31:46 2024 New Revision: 69415 Log: Apache Spark v4.0.0-preview1-rc2 Added: dev/spark/v4.0.0-preview1-rc2-bin/ dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.asc dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.asc dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-hadoop3.tgz (with props) dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.asc dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.sha512 dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz (with props) dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.asc dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.sha512 dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1.tgz (with props) dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1.tgz.asc dev/spark/v4.0.0-preview1-rc2-bin/spark-4.0.0-preview1.tgz.sha512 Added: dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.asc Tue May 28 04:31:46 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmZVW6kTHHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/WvFCEACK8aTOoX+9lkY3dGTcjk9YY2RmlHzg +k/gew4ZlH9liGSpXHcSPJ6iG/Su63rshfveHmM2ycKOkxLbzXbeRMBHtmQGI8sQf +q+usChfuexo0GH4kBsMU0xtJkUe3SCwRdIr6aDq5eH6yxVEPNrqyzbCekwmqgE7y +KV37qb7EQyq3sSZH0HFrAgEhgMMvQRRp/SD+WnHVoY4/dEtksZ4ip0TjXImKWIZG +HowM6Xks7M/qXsnk2kXzbrSY/lpWbGcBVcTr3Hh+z0iYMS05ohXk8JRx7hMmhUGc +sBcAYwupNzyai/lFWpToe17E6QI1mSIiG2CqOgtuYXs0za05673mZ6GcVwiLTrNz +tGH0CBY2G+9iEjHYR51bJTlIs6J9KvHz/CJmO2OUk9s14LHGLpQ3DPbEiHQ/r9Ic +Jb+WhDe/7Ajq8Ohq3bXm2fJIs7vDDC9bATFixaAY5o/jj6Q7hWeokZN7tyjFKigf +yoTCtkXPa+WHz2JueiY21EXBu8pD/S0GKsy1wctyT4WiykBBB5M1Ue43D+UerKgK +i/4UqZnEVTAVVMX1YkGgz5RxC6D/UdzmknNMbric7CFF/Imst0VYR7OBJtLRRl8X +8REh4Am6wFfjgEhM0zZOCFZha1Dd9isHykzG2sRtf0BNXsGkRplPjMGmqGul71Vd +4BHp/DiV4go/3g== +=SsBi +-END PGP SIGNATURE- Added: dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.sha512 == --- dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.sha512 (added) +++ dev/spark/v4.0.0-preview1-rc2-bin/SparkR_4.0.0-preview1.tar.gz.sha512 Tue May 28 04:31:46 2024 @@ -0,0 +1 @@ +dec2bf5ec07c86af950dcbe518be1fd5155d55c7a4c9b8e83c69e11dc2395806a18e526a3d2096c2d770569f0c2032d6fa96c7ade2ce83ded98ed6b5e26a SparkR_4.0.0-preview1.tar.gz Added: dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc2-bin/pyspark-4.0.0.dev1.tar.gz.asc Tue May 28 04:31:46 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmZVW6sTHHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/WgTOD/0UDo7uwuyjgxgtWNGD9uLxLs/JbhPp +iNzKX4h3wWD0yyvfYGIt4HGbBRNTiJ071PgLhQe3HC+e5Di2bvjW4RAa6OzhssFF +R6UVojuQK0AKR2+BidQNR/ODfx5wRyU5uPx8qJu7egjHepR+q/1NR+A/dyQpi93w +5cpIqqWNC3pd/JSQ1nIBM0jxWJuGtmm0IvMvhwyRuUdpZzo2ONpEjJnlNn9tCR1Z +xyRJnXnj/Zqd468E5Wn59iZJtwK7rSe1hNNYivLInEc+paDRZtKNz+xl/LWnXgzq +R4eIiRAiOjZnQtfuZceXb3rftFbzcxkzD1hvb1MxQO+Vf/tAcste1G3d+RJdEhdg +fPsOATbFe2K7+DHwwU1QnN2Pse/exuXCCa9KmJJXcGo8hnLEb2naDt3GuaweDb97 +CuwAqLcbwAJvng8G9RsZ8q+uKx06linFScOzgIw9Y8YzbubH4jy8PlgnZ+OYTM4p +PYfj81c91/ZTv0KgPCkpPTpYkjZfQkrTzHF8rAodJT1EheyGfWvEotbmgwUqH8Gm +nuNfkSmKBrzPpExUFvJiIlEapzg7C4u/mMO8WEOuLYKtwtOR9wwiZdPL0tTp16Ve +luxFjEHKkzQzB/TyA6QsK1FO92PlCyXAXz7jHsccU7Fip
(spark) branch master updated: [SPARK-48239][INFRA][FOLLOWUP] install the missing `jq` library
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 416d7f24fc35 [SPARK-48239][INFRA][FOLLOWUP] install the missing `jq` library 416d7f24fc35 is described below commit 416d7f24fc354e912773ceb160210ad6a0c5fe99 Author: Wenchen Fan AuthorDate: Fri May 24 20:53:00 2024 -0700 [SPARK-48239][INFRA][FOLLOWUP] install the missing `jq` library ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/46534 . We missed the `jq` library which is needed to create git tags. ### Why are the changes needed? fix bug ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manual ### Was this patch authored or co-authored using generative AI tooling? no Closes #46743 from cloud-fan/script. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- dev/create-release/release-util.sh | 3 +++ dev/create-release/spark-rm/Dockerfile | 1 + 2 files changed, 4 insertions(+) diff --git a/dev/create-release/release-util.sh b/dev/create-release/release-util.sh index 0394fb49c2fa..b5edbf40d487 100755 --- a/dev/create-release/release-util.sh +++ b/dev/create-release/release-util.sh @@ -128,6 +128,9 @@ function get_release_info { RC_COUNT=1 fi + if [ "$GIT_BRANCH" = "master" ]; then +RELEASE_VERSION="$RELEASE_VERSION-preview1" + fi export NEXT_VERSION export RELEASE_VERSION=$(read_config "Release" "$RELEASE_VERSION") diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile index adaa4df3f579..5fdaf58feee2 100644 --- a/dev/create-release/spark-rm/Dockerfile +++ b/dev/create-release/spark-rm/Dockerfile @@ -58,6 +58,7 @@ RUN apt-get update && apt-get install -y \ texinfo \ texlive-latex-extra \ qpdf \ +jq \ r-base \ ruby \ ruby-dev \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) tag v4.0.0-preview1-rc2 created (now 7cfe5a6e44e8)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to tag v4.0.0-preview1-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git at 7cfe5a6e44e8 (commit) This tag includes the following new commits: new 7cfe5a6e44e8 Preparing Spark release v4.0.0-preview1-rc2 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) 01/01: Preparing Spark release v4.0.0-preview1-rc2
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to tag v4.0.0-preview1-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git commit 7cfe5a6e44e8d7079ae29ad3e2cee7231cd3dc66 Author: Wenchen Fan AuthorDate: Fri May 24 18:53:15 2024 + Preparing Spark release v4.0.0-preview1-rc2 --- R/pkg/R/sparkR.R | 4 ++-- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- common/utils/pom.xml | 2 +- common/variant/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/profiler/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/yarn/pom.xml | 2 +- sql/api/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 46 files changed, 49 insertions(+), 49 deletions(-) diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R index 0be7e5da24d2..478acf514ef3 100644 --- a/R/pkg/R/sparkR.R +++ b/R/pkg/R/sparkR.R @@ -456,8 +456,8 @@ sparkR.session <- function( # Check if version number of SparkSession matches version number of SparkR package jvmVersion <- callJMethod(sparkSession, "version") - # Remove -SNAPSHOT from jvm versions - jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion, fixed = TRUE) + # Remove -preview1 from jvm versions + jvmVersionStrip <- gsub("-preview1", "", jvmVersion, fixed = TRUE) rPackageVersion <- paste0(packageVersion("SparkR")) if (jvmVersionStrip != rPackageVersion) { diff --git a/assembly/pom.xml b/assembly/pom.xml index 58e7ae5bb0c7..417e7c23ca9f 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview1 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 046648e9c2ae..e1a4497387a2 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview1 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index cdb5bd72158a..d8dff6996cec 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview1 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 0f7036ef
(spark) tag v4.0.0-preview-rc1 deleted (was 9fec87d16a04)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to tag v4.0.0-preview-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git *** WARNING: tag v4.0.0-preview-rc1 was deleted! *** was 9fec87d16a04 Preparing Spark release v4.0.0-preview-rc1 This change permanently discards the following revisions: discard 9fec87d16a04 Preparing Spark release v4.0.0-preview-rc1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48364][SQL] Add AbstractMapType type casting and fix RaiseError parameter map to work with collated strings
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6be3560f3c89 [SPARK-48364][SQL] Add AbstractMapType type casting and fix RaiseError parameter map to work with collated strings 6be3560f3c89 is described below commit 6be3560f3c89e212e850a0788d24a7c0755ea35b Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Wed May 22 05:21:23 2024 -0700 [SPARK-48364][SQL] Add AbstractMapType type casting and fix RaiseError parameter map to work with collated strings ### What changes were proposed in this pull request? Following up on the introduction of AbstractMapType (https://github.com/apache/spark/pull/46458) and changes that introduce collation awareness for RaiseError expression (https://github.com/apache/spark/pull/46461), this PR should add the appropriate type casting rules for AbstractMapType. ### Why are the changes needed? Fix the CI failure for the `Support RaiseError misc expression with collation` test when ANSI is off. ### Does this PR introduce _any_ user-facing change? Yes, type casting is now allowed for map types with collated strings. ### How was this patch tested? Extended suite `CollationSQLExpressionsANSIOffSuite` with ANSI disabled. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46661 from uros-db/fix-abstract-map. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../sql/catalyst/analysis/CollationTypeCasts.scala | 15 - .../spark/sql/catalyst/analysis/TypeCoercion.scala | 13 +-- .../spark/sql/catalyst/expressions/misc.scala | 4 ++-- .../spark/sql/CollationSQLExpressionsSuite.scala | 10 +++-- .../org/apache/spark/sql/CollationSuite.scala | 25 ++ 5 files changed, 37 insertions(+), 30 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala index a50dad7c8cdb..00abdf4ee19d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType, haveS import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.errors.QueryCompilationErrors import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.{ArrayType, DataType, StringType} +import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StringType} object CollationTypeCasts extends TypeCoercionRule { override val transform: PartialFunction[Expression, Expression] = { @@ -85,6 +85,11 @@ object CollationTypeCasts extends TypeCoercionRule { private def extractStringType(dt: DataType): StringType = dt match { case st: StringType => st case ArrayType(et, _) => extractStringType(et) +case MapType(kt, vt, _) => if (hasStringType(kt)) { +extractStringType(kt) + } else { +extractStringType(vt) + } } /** @@ -102,6 +107,14 @@ object CollationTypeCasts extends TypeCoercionRule { case st: StringType if st.collationId != castType.collationId => castType case ArrayType(arrType, nullable) => castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case MapType(keyType, valueType, nullable) => +val newKeyType = castStringType(keyType, castType).getOrElse(keyType) +val newValueType = castStringType(valueType, castType).getOrElse(valueType) +if (newKeyType != keyType || newValueType != valueType) { + MapType(newKeyType, newValueType, nullable) +} else { + null +} case _ => null } Option(ret) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala index 936bb22baa46..7866f47c28b1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala @@ -31,7 +31,7 @@ import org.apache.spark.sql.catalyst.trees.AlwaysProcess import org.apache.spark.sql.catalyst.types.DataTypeUtils import org.apache.spark.sql.errors.QueryCompilationErrors import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.internal.types.{AbstractArrayType, AbstractStringType, StringTypeAnyCollation} +import or
(spark) branch master updated: [SPARK-48215][SQL] Extending support for collated strings on date_format expression
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e04d3d7c430a [SPARK-48215][SQL] Extending support for collated strings on date_format expression e04d3d7c430a is described below commit e04d3d7c430a1fa446f0379680f619b8b14b5eb5 Author: Nebojsa Savic AuthorDate: Wed May 22 04:28:06 2024 -0700 [SPARK-48215][SQL] Extending support for collated strings on date_format expression ### What changes were proposed in this pull request? We are extending support for collated strings on date_format function, since currently it throws DATATYPE_MISSMATCH exception when collated strings are passed as "format" parameter. https://docs.databricks.com/en/sql/language-manual/functions/date_format.html ### Why are the changes needed? Exception is thrown on invocation when collated strings are passed as arguments to date_format. ### Does this PR introduce _any_ user-facing change? No user facing changes, extending support. ### How was this patch tested? Tests are added with this PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46561 from nebojsa-db/SPARK-48215. Authored-by: Nebojsa Savic Signed-off-by: Wenchen Fan --- .../catalyst/expressions/datetimeExpressions.scala | 5 ++-- .../spark/sql/CollationSQLExpressionsSuite.scala | 32 ++ 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 081a42f5608e..8caf8c5d48c2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -36,6 +36,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils._ import org.apache.spark.sql.catalyst.util.LegacyDateFormats.SIMPLE_DATE_FORMAT import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryExecutionErrors} import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.types.StringTypeAnyCollation import org.apache.spark.sql.types._ import org.apache.spark.sql.types.DayTimeIntervalType.DAY import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} @@ -951,9 +952,9 @@ case class DateFormatClass(left: Expression, right: Expression, timeZoneId: Opti def this(left: Expression, right: Expression) = this(left, right, None) - override def dataType: DataType = StringType + override def dataType: DataType = SQLConf.get.defaultStringType - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType, StringType) + override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType, StringTypeAnyCollation) override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = copy(timeZoneId = Option(timeZoneId)) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala index 0d48f9f0a88d..828245bb3fdd 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala @@ -1600,6 +1600,38 @@ class CollationSQLExpressionsSuite }) } + test("DateFormat expression with collation") { +case class DateFormatTestCase[R](date: String, format: String, collation: String, result: R) +val testCases = Seq( + DateFormatTestCase("2021-01-01", "-MM-dd", "UTF8_BINARY", "2021-01-01"), + DateFormatTestCase("2021-01-01", "-dd", "UTF8_BINARY_LCASE", "2021-01"), + DateFormatTestCase("2021-01-01", "-MM-dd", "UNICODE", "2021-01-01"), + DateFormatTestCase("2021-01-01", "", "UNICODE_CI", "2021") +) + +for { + collateDate <- Seq(true, false) + collateFormat <- Seq(true, false) +} { + testCases.foreach(t => { +val dateArg = if (collateDate) s"collate('${t.date}', '${t.collation}')" else s"'${t.date}'" +val formatArg = + if (collateFormat) { +s"collate('${t.format}', '${t.collation}')" + } else { +s"'${t.format}'" + } + +withSQLConf(SqlApiConf.DEFAULT_COLLATION -> t.collation) { + val query = s"SELECT date_format(${dateArg}, ${formatArg})" +
(spark) branch master updated: [SPARK-48031] Decompose viewSchemaMode config, add SHOW CREATE TABLE support
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 617ac1aec748 [SPARK-48031] Decompose viewSchemaMode config, add SHOW CREATE TABLE support 617ac1aec748 is described below commit 617ac1aec7481d6063af539b02980692e98beb70 Author: Serge Rielau AuthorDate: Mon May 20 16:01:24 2024 +0800 [SPARK-48031] Decompose viewSchemaMode config, add SHOW CREATE TABLE support ### What changes were proposed in this pull request? We separate enablement of WITH SCHEMA ... clause from the change in default from SCHEMA BINDING to SCHEMA COMPENSATION. This allows user to upgrade in two steps: 1. Enable the feature, and deal with DESCRIBE EXTENDED. 2. Get their affairs in order by ALTER VIEW to SCHEMA BINDING for those views they aim to keep in that mode 3. Switch the default. ### Why are the changes needed? It allows customers to upgrade more safely. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Added more tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #46652 from srielau/SPARK-48031-view-evolutiion-part2. Lead-authored-by: Serge Rielau Co-authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- docs/sql-migration-guide.md| 3 +- .../sql/catalyst/catalog/SessionCatalog.scala | 6 +- .../spark/sql/catalyst/catalog/interface.scala | 6 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 14 +- .../org/apache/spark/sql/internal/SQLConf.scala| 26 ++- .../spark/sql/execution/command/tables.scala | 7 + .../view-schema-binding-config.sql.out | 166 +-- .../analyzer-results/view-schema-binding.sql.out | 24 +-- .../inputs/view-schema-binding-config.sql | 52 +++-- .../sql-tests/inputs/view-schema-binding.sql | 2 +- .../sql-tests/results/charvarchar.sql.out | 1 + .../sql-tests/results/show-create-table.sql.out| 6 + .../results/view-schema-binding-config.sql.out | 231 ++--- .../sql-tests/results/view-schema-binding.sql.out | 25 +-- .../apache/spark/sql/execution/SQLViewSuite.scala | 2 +- .../spark/sql/execution/SQLViewTestSuite.scala | 7 +- 16 files changed, 453 insertions(+), 125 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 15205e9284cd..02a4fae5d262 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -54,7 +54,8 @@ license: | - Since Spark 4.0, The default value for `spark.sql.legacy.ctePrecedencePolicy` has been changed from `EXCEPTION` to `CORRECTED`. Instead of raising an error, inner CTE definitions take precedence over outer definitions. - Since Spark 4.0, The default value for `spark.sql.legacy.timeParserPolicy` has been changed from `EXCEPTION` to `CORRECTED`. Instead of raising an `INCONSISTENT_BEHAVIOR_CROSS_VERSION` error, `CANNOT_PARSE_TIMESTAMP` will be raised if ANSI mode is enable. `NULL` will be returned if ANSI mode is disabled. See [Datetime Patterns for Formatting and Parsing](sql-ref-datetime-pattern.html). - Since Spark 4.0, A bug falsely allowing `!` instead of `NOT` when `!` is not a prefix operator has been fixed. Clauses such as `expr ! IN (...)`, `expr ! BETWEEN ...`, or `col ! NULL` now raise syntax errors. To restore the previous behavior, set `spark.sql.legacy.bangEqualsNot` to `true`. -- Since Spark 4.0, Views allow control over how they react to underlying query changes. By default views tolerate column type changes in the query and compensate with casts. To restore the previous behavior, allowing up-casts only, set `spark.sql.viewSchemaBindingMode` to `DISABLED`. This disables the feature and also disallows the `WITH SCHEMA` clause. +- Since Spark 4.0, By default views tolerate column type changes in the query and compensate with casts. To restore the previous behavior, allowing up-casts only, set `spark.sql.legacy.viewSchemaCompensation` to `false`. +- Since Spark 4.0, Views allow control over how they react to underlying query changes. By default views tolerate column type changes in the query and compensate with casts. To disable thsi feature set `spark.sql.legacy.viewSchemaBindingMode` to `false`. This also removes the clause from `DESCRIBE EXTENDED` and `SHOW CREATE TABLE`. ## Upgrading from Spark SQL 3.5.1 to 3.5.2 diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index 96883afcfc5c..dbf2102a183a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog
(spark) branch master updated: [SPARK-48305][SQL] Add collation support for CurrentLike expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6a17b794338b [SPARK-48305][SQL] Add collation support for CurrentLike expressions 6a17b794338b is described below commit 6a17b794338b0473c11ae17e5c8f1450c0b3f358 Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Mon May 20 15:51:23 2024 +0800 [SPARK-48305][SQL] Add collation support for CurrentLike expressions ### What changes were proposed in this pull request? Introduce collation awareness for CurrentLike expressions: current_database/current_schema, current_catalog, user/current_user/session_user. ### Why are the changes needed? Add collation support for CurrentLike expressions in Spark. ### Does this PR introduce _any_ user-facing change? Yes, users should now be able to use collated strings within arguments for CurrentLike functions: current_database/current_schema, current_catalog, user/current_user/session_user. ### How was this patch tested? E2e sql tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46613 from uros-db/current-like-expressions. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/catalyst/expressions/misc.scala | 6 +++--- .../spark/sql/catalyst/optimizer/finishAnalysis.scala| 7 --- .../apache/spark/sql/CollationSQLExpressionsSuite.scala | 16 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala index eda65ae48f00..e9fa362de14c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala @@ -200,7 +200,7 @@ object AssertTrue { since = "1.6.0", group = "misc_funcs") case class CurrentDatabase() extends LeafExpression with Unevaluable { - override def dataType: DataType = StringType + override def dataType: DataType = SQLConf.get.defaultStringType override def nullable: Boolean = false override def prettyName: String = "current_schema" final override val nodePatterns: Seq[TreePattern] = Seq(CURRENT_LIKE) @@ -219,7 +219,7 @@ case class CurrentDatabase() extends LeafExpression with Unevaluable { since = "3.1.0", group = "misc_funcs") case class CurrentCatalog() extends LeafExpression with Unevaluable { - override def dataType: DataType = StringType + override def dataType: DataType = SQLConf.get.defaultStringType override def nullable: Boolean = false override def prettyName: String = "current_catalog" final override val nodePatterns: Seq[TreePattern] = Seq(CURRENT_LIKE) @@ -335,7 +335,7 @@ case class TypeOf(child: Expression) extends UnaryExpression { // scalastyle:on line.size.limit case class CurrentUser() extends LeafExpression with Unevaluable { override def nullable: Boolean = false - override def dataType: DataType = StringType + override def dataType: DataType = SQLConf.get.defaultStringType override def prettyName: String = getTagValue(FunctionRegistry.FUNC_ALIAS).getOrElse("current_user") final override val nodePatterns: Seq[TreePattern] = Seq(CURRENT_LIKE) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala index 92ac7599a8ff..48753fbfe326 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala @@ -33,6 +33,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, convertSpecialTimestamp, convertSpecialTimestampNTZ, instantToMicros, localDateTimeToMicros} import org.apache.spark.sql.catalyst.util.TypeUtils.toSQLExpr import org.apache.spark.sql.connector.catalog.CatalogManager +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ @@ -151,11 +152,11 @@ case class ReplaceCurrentLike(catalogManager: CatalogManager) extends Rule[Logic plan.transformAllExpressionsWithPruning(_.containsPattern(CURRENT_LIKE)) { case CurrentDatabase() => -Literal.create(currentNamespace, StringType) +Literal.create(currentNamespace, SQLConf.get.defaultStringType) case CurrentCatalog() => -
(spark) branch master updated: [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6f6b4860268d [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE 6f6b4860268d is described below commit 6f6b4860268dc250d8e31a251d740733798aa512 Author: Stefan Kandic AuthorDate: Sat May 18 15:17:56 2024 +0800 [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE ### What changes were proposed in this pull request? Changing serialization and deserialization of collated strings so that the collation information is put in the metadata of the enclosing struct field - and then read back from there during parsing. Format of serialization will look something like this: ```json { "type": "struct", "fields": [ "name": "colName", "type": "string", "nullable": true, "metadata": { "__COLLATIONS": { "colName": "UNICODE" } } ] } ``` If we have a map we will add suffixes `.key` and `.value` in the metadata: ```json { "type": "struct", "fields": [ { "name": "mapField", "type": { "type": "map", "keyType": "string", "valueType": "string", "valueContainsNull": true }, "nullable": true, "metadata": { "__COLLATIONS": { "mapField.key": "UNICODE", "mapField.value": "UNICODE" } } } ] } ``` It will be a similar story for arrays (we will add `.element` suffix). We could have multiple suffixes when working with deeply nested data types (Map[String, Array[Array[String]]] - see tests for this example) ### Why are the changes needed? Putting collation info in field metadata is the only way to not break old clients reading new tables with collations. `CharVarcharUtils` does a similar thing but this is much less hacky, and more friendly for all 3p clients - which is especially important since delta also uses spark for schema ser/de. It will also remove the need for additional logic introduced in #46083 to remove collations before writing to HMS as this way the tables will be fully HMS compatible. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? With unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #46280 from stefankandic/newDeltaSchema. Lead-authored-by: Stefan Kandic Co-authored-by: Stefan Kandic <154237371+stefankan...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/util/CollationFactory.java | 99 +++- .../src/main/resources/error/error-conditions.json | 12 + python/pyspark/errors/error-conditions.json| 10 + .../pyspark/sql/tests/connect/test_parity_types.py | 4 + python/pyspark/sql/tests/test_types.py | 249 +++-- python/pyspark/sql/types.py| 178 +-- .../org/apache/spark/sql/types/DataType.scala | 74 +- .../org/apache/spark/sql/types/StringType.scala| 7 + .../org/apache/spark/sql/types/StructField.scala | 62 - .../org/apache/spark/sql/types/DataTypeSuite.scala | 181 ++- .../apache/spark/sql/types/StructTypeSuite.scala | 183 +++ .../streaming/StreamingDeduplicationSuite.scala| 2 +- .../spark/sql/streaming/StreamingQuerySuite.scala | 2 +- 13 files changed, 1004 insertions(+), 59 deletions(-) diff --git a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java index 863445b6..0133c3feb611 100644 --- a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java +++ b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java @@ -36,11 +36,62 @@ import org.apache.spark.unsafe.types.UTF8String; * Provides functionality to the UTF8String object which respects defined collation settings. */ public final class CollationFactory { + + /** + * Identifier for single a collation. + */ + public static class CollationIdentifier { +private final String provider; +private final String name
(spark) branch master updated (15fb4787354a -> 3edd6c7e1d50)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 15fb4787354a [SPARK-48321][CONNECT][TESTS] Avoid using deprecated methods in dsl add 3edd6c7e1d50 [SPARK-48312][SQL] Improve Alias.removeNonInheritableMetadata performance No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/types/Metadata.scala | 7 +++ .../spark/sql/catalyst/expressions/namedExpressions.scala | 14 +++--- 2 files changed, 18 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48308][CORE] Unify getting data schema without partition columns in FileSourceStrategy
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 57948c865e06 [SPARK-48308][CORE] Unify getting data schema without partition columns in FileSourceStrategy 57948c865e06 is described below commit 57948c865e064469a75c92f8b58c632b9b40fdd3 Author: Johan Lasperas AuthorDate: Thu May 16 22:38:02 2024 +0800 [SPARK-48308][CORE] Unify getting data schema without partition columns in FileSourceStrategy ### What changes were proposed in this pull request? Compute the schema of the data without partition columns only once in FileSourceStrategy. ### Why are the changes needed? In FileSourceStrategy, the schema of the data excluding partition columns is computed 2 times in a slightly different way, using an AttributeSet (`partitionSet`) and using the attributes directly (`partitionColumns`) These don't have the exact same semantics, AttributeSet will only use expression ids for comparison while comparing with the actual attributes will use the name, type, nullability and metadata. We want to use the former here. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #46619 from johanl-db/reuse-schema-without-partition-columns. Authored-by: Johan Lasperas Signed-off-by: Wenchen Fan --- .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala| 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala index 8333c276cdd8..d31cb111924b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala @@ -216,9 +216,8 @@ object FileSourceStrategy extends Strategy with PredicateHelper with Logging { val requiredExpressions: Seq[NamedExpression] = filterAttributes.toSeq ++ projects val requiredAttributes = AttributeSet(requiredExpressions) - val readDataColumns = dataColumns + val readDataColumns = dataColumnsWithoutPartitionCols .filter(requiredAttributes.contains) -.filterNot(partitionColumns.contains) // Metadata attributes are part of a column of type struct up to this point. Here we extract // this column from the schema and specify a matcher for that. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (fa83d0f8fce7 -> 4be0828e6e6a)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from fa83d0f8fce7 [SPARK-48296][SQL] Codegen Support for `to_xml` add 4be0828e6e6a [SPARK-48288] Add source data type for connector cast expression No new revisions were added by this update. Summary of changes: .../apache/spark/sql/connector/expressions/Cast.java | 18 +- .../sql/connector/util/V2ExpressionSQLBuilder.java | 6 +++--- .../spark/sql/catalyst/util/V2ExpressionBuilder.scala | 2 +- .../scala/org/apache/spark/sql/jdbc/JdbcDialects.scala | 4 ++-- 4 files changed, 23 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48252][SQL] Update CommonExpressionRef when necessary
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ca3593288d57 [SPARK-48252][SQL] Update CommonExpressionRef when necessary ca3593288d57 is described below commit ca3593288d577435a193f356b5214cf6f4bd534a Author: Wenchen Fan AuthorDate: Thu May 16 09:42:36 2024 +0800 [SPARK-48252][SQL] Update CommonExpressionRef when necessary ### What changes were proposed in this pull request? The `With` expression assumes that it should be created after all input expressions are fully resolved. This is mostly true (function lookup happens after function input expressions are resolved), but there is a special case of column resolution in HAVING: we use `TempResolvedColumn` to try one column resolution option. If it doesn't work, re-resolve the column, which may be a different data type. `With` expression should update the refs when this happens. ### Why are the changes needed? bug fix, otherwise the query will fail ### Does this PR introduce _any_ user-facing change? This feature is not released yet. ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #46552 from cloud-fan/with. Lead-authored-by: Wenchen Fan Co-authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- .../apache/spark/sql/catalyst/expressions/With.scala | 18 +- .../optimizer/RewriteWithExpressionSuite.scala | 14 ++ 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala index 29794b33641c..5f6f9afa5797 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala @@ -40,7 +40,23 @@ case class With(child: Expression, defs: Seq[CommonExpressionDef]) override def children: Seq[Expression] = child +: defs override protected def withNewChildrenInternal( newChildren: IndexedSeq[Expression]): Expression = { -copy(child = newChildren.head, defs = newChildren.tail.map(_.asInstanceOf[CommonExpressionDef])) +val newDefs = newChildren.tail.map(_.asInstanceOf[CommonExpressionDef]) +// If any `CommonExpressionDef` has been updated (data type or nullability), also update its +// `CommonExpressionRef` in the `child`. +val newChild = newDefs.filter(_.resolved).foldLeft(newChildren.head) { (result, newDef) => + defs.find(_.id == newDef.id).map { oldDef => +if (newDef.dataType != oldDef.dataType || newDef.nullable != oldDef.nullable) { + val newRef = new CommonExpressionRef(newDef) + result.transform { +case oldRef: CommonExpressionRef if oldRef.id == newRef.id => + newRef + } +} else { + result +} + }.getOrElse(result) +} +copy(child = newChild, defs = newDefs) } /** diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala index aa8ffb2b0454..0aeca961aa51 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.catalyst.optimizer import org.apache.spark.SparkException +import org.apache.spark.sql.catalyst.analysis.TempResolvedColumn import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.catalyst.dsl.plans._ import org.apache.spark.sql.catalyst.expressions._ @@ -438,4 +439,17 @@ class RewriteWithExpressionSuite extends PlanTest { Optimizer.execute(plan) } } + + test("SPARK-48252: TempResolvedColumn in common expression") { +val a = testRelation.output.head +val tempResolved = TempResolvedColumn(a, Seq("a")) +val expr = With(tempResolved) { case Seq(ref) => + ref === 1 +} +val plan = testRelation.having($"b")(avg("a").as("a"))(expr).analyze +comparePlans( + Optimizer.execute(plan), + testRelation.groupBy($"b")(avg("a").as("a")).where($"a" === 1).analyze +) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-48172][SQL] Fix escaping issues in JDBCDialects
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 0e7156d2d801 [SPARK-48172][SQL] Fix escaping issues in JDBCDialects 0e7156d2d801 is described below commit 0e7156d2d80171876c7a5e674349c53ee013be38 Author: Mihailo Milosevic AuthorDate: Wed May 15 22:15:52 2024 +0800 [SPARK-48172][SQL] Fix escaping issues in JDBCDialects This PR is a fix of https://github.com/apache/spark/pull/46437. The previous PR was reverted as `LONGTEXT` is not supported by all dialects. Special case escaping for MySQL and fix issues with redundant escaping for ' character. New changes introduced in the fix include change `LONGTEXT` -> `VARCHAR(50)`, as well as fix for table naming in the tests. When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down. Yes Tests for each existing dialect. No. Closes #46588 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic Signed-off-by: Wenchen Fan (cherry picked from commit 9e386b472981979e368a5921c58da5bfefe3acfe) Signed-off-by: Wenchen Fan --- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 6 + .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 11 + .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 6 + .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 + .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 229 + .../sql/connector/util/V2ExpressionSQLBuilder.java | 3 - .../sql/connector/expressions/expressions.scala| 4 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 7 - .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 15 ++ .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 6 +- 12 files changed, 291 insertions(+), 14 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 1a25cd2802dd..fd99bb2a3bc5 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala @@ -67,6 +67,12 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest { connection.prepareStatement( "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary DECIMAL(20, 2), bonus DOUBLE)") .executeUpdate() +connection.prepareStatement( + s"""CREATE TABLE pattern_testing_table ( + |pattern_testing_col VARCHAR(50) + |) + """.stripMargin +).executeUpdate() } override def testUpdateColumnType(tbl: String): Unit = { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index 72edfc9f1bf1..5f4f0b7a3afb 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -38,6 +38,17 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { .executeUpdate() connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() + +connection.prepareStatement( + s""" + |INSERT INTO pattern_testing_table VALUES + |('special_character_quote''_present'), + |('special_character_quote_not_present'), + |('special_character_percent%_present'), + |('special_character_percent_not_present'), + |('special_character_underscore_present'), + |('special_character_underscorenot_present') + """.stripMargin).executeUpdate() } def tablePreparation(connection: Connection): Unit diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala index a527c6f8cb5b..51f31220d9a5 100644 --
(spark) branch branch-3.5 updated: [SPARK-48172][SQL] Fix escaping issues in JDBCDialects
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 210ed2521d3d [SPARK-48172][SQL] Fix escaping issues in JDBCDialects 210ed2521d3d is described below commit 210ed2521d3dc1202cd1ba855ed5e729a5d940d0 Author: Mihailo Milosevic AuthorDate: Wed May 15 22:15:52 2024 +0800 [SPARK-48172][SQL] Fix escaping issues in JDBCDialects This PR is a fix of https://github.com/apache/spark/pull/46437. The previous PR was reverted as `LONGTEXT` is not supported by all dialects. Special case escaping for MySQL and fix issues with redundant escaping for ' character. New changes introduced in the fix include change `LONGTEXT` -> `VARCHAR(50)`, as well as fix for table naming in the tests. When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down. Yes Tests for each existing dialect. No. Closes #46588 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic Signed-off-by: Wenchen Fan (cherry picked from commit 9e386b472981979e368a5921c58da5bfefe3acfe) Signed-off-by: Wenchen Fan --- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 6 + .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 11 + .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 6 + .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 + .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 229 + .../sql/connector/util/V2ExpressionSQLBuilder.java | 3 - .../sql/connector/expressions/expressions.scala| 4 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 7 - .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 15 ++ .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 6 +- 12 files changed, 291 insertions(+), 14 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 9a78244f5326..5bcc8afefb1d 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala @@ -80,6 +80,12 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest { connection.prepareStatement( "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary DECIMAL(20, 2), bonus DOUBLE)") .executeUpdate() +connection.prepareStatement( + s"""CREATE TABLE pattern_testing_table ( + |pattern_testing_col VARCHAR(50) + |) + """.stripMargin +).executeUpdate() } override def testUpdateColumnType(tbl: String): Unit = { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index 72edfc9f1bf1..5f4f0b7a3afb 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -38,6 +38,17 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { .executeUpdate() connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() + +connection.prepareStatement( + s""" + |INSERT INTO pattern_testing_table VALUES + |('special_character_quote''_present'), + |('special_character_quote_not_present'), + |('special_character_percent%_present'), + |('special_character_percent_not_present'), + |('special_character_underscore_present'), + |('special_character_underscorenot_present') + """.stripMargin).executeUpdate() } def tablePreparation(connection: Connection): Unit diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala index 0dc3a39f4db5..0bb2ea8249b3 100644 --
(spark) branch master updated: [SPARK-48172][SQL] Fix escaping issues in JDBCDialects
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9e386b472981 [SPARK-48172][SQL] Fix escaping issues in JDBCDialects 9e386b472981 is described below commit 9e386b472981979e368a5921c58da5bfefe3acfe Author: Mihailo Milosevic AuthorDate: Wed May 15 22:15:52 2024 +0800 [SPARK-48172][SQL] Fix escaping issues in JDBCDialects This PR is a fix of https://github.com/apache/spark/pull/46437. The previous PR was reverted as `LONGTEXT` is not supported by all dialects. ### What changes were proposed in this pull request? Special case escaping for MySQL and fix issues with redundant escaping for ' character. New changes introduced in the fix include change `LONGTEXT` -> `VARCHAR(50)`, as well as fix for table naming in the tests. ### Why are the changes needed? When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Tests for each existing dialect. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46588 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic Signed-off-by: Wenchen Fan --- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 6 + .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 11 + .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 6 + .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 + .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 229 + .../sql/connector/util/V2ExpressionSQLBuilder.java | 1 - .../sql/connector/expressions/expressions.scala| 4 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 7 - .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 15 ++ .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 6 +- 12 files changed, 291 insertions(+), 12 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 3642094d11b2..57129e9d846f 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala @@ -62,6 +62,12 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest { connection.prepareStatement( "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary DECIMAL(20, 2), bonus DOUBLE)") .executeUpdate() +connection.prepareStatement( + s"""CREATE TABLE pattern_testing_table ( + |pattern_testing_col VARCHAR(50) + |) + """.stripMargin +).executeUpdate() } override def testUpdateColumnType(tbl: String): Unit = { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index 72edfc9f1bf1..5f4f0b7a3afb 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -38,6 +38,17 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { .executeUpdate() connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() + +connection.prepareStatement( + s""" + |INSERT INTO pattern_testing_table VALUES + |('special_character_quote''_present'), + |('special_character_quote_not_present'), + |('special_character_percent%_present'), + |('special_character_percent_not_present'), + |('special_character_underscore_present'), + |('special_character_underscorenot_present') + """.stripMargin).executeUpdate() } def tablePreparation(connection: Connection): Unit diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/
(spark) branch master updated (8c0a7ba82c98 -> 5e87e9fbd6e6)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8c0a7ba82c98 [SPARK-48160][SQL] Add collation support for XPATH expressions add 5e87e9fbd6e6 [SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48160][SQL] Add collation support for XPATH expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8c0a7ba82c98 [SPARK-48160][SQL] Add collation support for XPATH expressions 8c0a7ba82c98 is described below commit 8c0a7ba82c98c7f7e686c4ee81d2aad49cc7a6e0 Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Wed May 15 14:24:46 2024 +0800 [SPARK-48160][SQL] Add collation support for XPATH expressions ### What changes were proposed in this pull request? Introduce collation awareness for XPath expressions: xpath_boolean, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_string, xpath. ### Why are the changes needed? Add collation support for Xpath expressions in Spark. ### Does this PR introduce _any_ user-facing change? Yes, users should now be able to use collated strings within arguments for XPath functions: xpath_boolean, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_string, xpath. ### How was this patch tested? E2e sql tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46508 from uros-db/xpath-expressions. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/expressions/xml/xpath.scala | 11 -- .../spark/sql/CollationSQLExpressionsSuite.scala | 44 ++ 2 files changed, 51 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala index c3a285178c11..f65061e8d0ea 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala @@ -23,6 +23,8 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.Cast._ import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback import org.apache.spark.sql.catalyst.util.GenericArrayData +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.types.StringTypeAnyCollation import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String @@ -39,7 +41,8 @@ abstract class XPathExtract /** XPath expressions are always nullable, e.g. if the xml string is empty. */ override def nullable: Boolean = true - override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType) + override def inputTypes: Seq[AbstractDataType] = +Seq(StringTypeAnyCollation, StringTypeAnyCollation) override def checkInputDataTypes(): TypeCheckResult = { if (!path.foldable) { @@ -47,7 +50,7 @@ abstract class XPathExtract errorSubClass = "NON_FOLDABLE_INPUT", messageParameters = Map( "inputName" -> toSQLId("path"), - "inputType" -> toSQLType(StringType), + "inputType" -> toSQLType(StringTypeAnyCollation), "inputExpr" -> toSQLExpr(path) ) ) @@ -221,7 +224,7 @@ case class XPathDouble(xml: Expression, path: Expression) extends XPathExtract { // scalastyle:on line.size.limit case class XPathString(xml: Expression, path: Expression) extends XPathExtract { override def prettyName: String = "xpath_string" - override def dataType: DataType = StringType + override def dataType: DataType = SQLConf.get.defaultStringType override def nullSafeEval(xml: Any, path: Any): Any = { val ret = xpathUtil.evalString(xml.asInstanceOf[UTF8String].toString, pathString) @@ -245,7 +248,7 @@ case class XPathString(xml: Expression, path: Expression) extends XPathExtract { // scalastyle:on line.size.limit case class XPathList(xml: Expression, path: Expression) extends XPathExtract { override def prettyName: String = "xpath" - override def dataType: DataType = ArrayType(StringType, containsNull = false) + override def dataType: DataType = ArrayType(SQLConf.get.defaultStringType, containsNull = false) override def nullSafeEval(xml: Any, path: Any): Any = { val nodeList = xpathUtil.evalNodeList(xml.asInstanceOf[UTF8String].toString, pathString) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala index 48c3853bb5cf..37dcdf9bd721 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala @@ -548,6 +548,5
(spark) branch master updated: [SPARK-48162][SQL] Add collation support for MISC expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 723354039f1d [SPARK-48162][SQL] Add collation support for MISC expressions 723354039f1d is described below commit 723354039f1de587cacdf4ba48c076a896fdffd1 Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Wed May 15 14:23:31 2024 +0800 [SPARK-48162][SQL] Add collation support for MISC expressions ### What changes were proposed in this pull request? Introduce collation awareness for misc expressions: raise_error, uuid, version, typeof, aes_encrypt, aes_decrypt. ### Why are the changes needed? Add collation support for misc expressions in Spark. ### Does this PR introduce _any_ user-facing change? Yes, users should now be able to use collated strings within arguments for misc functions: raise_error, uuid, version, typeof, aes_encrypt, aes_decrypt. ### How was this patch tested? E2e sql tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46461 from uros-db/misc-expressions. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../explain-results/function_aes_decrypt.explain | 2 +- .../function_aes_decrypt_with_mode.explain | 2 +- .../function_aes_decrypt_with_mode_padding.explain | 2 +- ...ction_aes_decrypt_with_mode_padding_aad.explain | 2 +- .../explain-results/function_aes_encrypt.explain | 2 +- .../function_aes_encrypt_with_mode.explain | 2 +- .../function_aes_encrypt_with_mode_padding.explain | 2 +- ...nction_aes_encrypt_with_mode_padding_iv.explain | 2 +- ...on_aes_encrypt_with_mode_padding_iv_aad.explain | 2 +- .../function_try_aes_decrypt.explain | 2 +- .../function_try_aes_decrypt_with_mode.explain | 2 +- ...ction_try_aes_decrypt_with_mode_padding.explain | 2 +- ...n_try_aes_decrypt_with_mode_padding_aad.explain | 2 +- .../spark/sql/catalyst/expressions/misc.scala | 14 ++- .../spark/sql/CollationSQLExpressionsSuite.scala | 136 + 15 files changed, 157 insertions(+), 19 deletions(-) diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_aes_decrypt.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_aes_decrypt.explain index 31e03b79eb98..55f1c314671a 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_aes_decrypt.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_aes_decrypt.explain @@ -1,2 +1,2 @@ -Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils, BinaryType, aesDecrypt, cast(g#0 as binary), cast(g#0 as binary), GCM, DEFAULT, cast( as binary), BinaryType, BinaryType, StringType, StringType, BinaryType, true, true, true) AS aes_decrypt(g, g, GCM, DEFAULT, )#0] +Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils, BinaryType, aesDecrypt, cast(g#0 as binary), cast(g#0 as binary), GCM, DEFAULT, cast( as binary), BinaryType, BinaryType, StringTypeAnyCollation, StringTypeAnyCollation, BinaryType, true, true, true) AS aes_decrypt(g, g, GCM, DEFAULT, )#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_aes_decrypt_with_mode.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_aes_decrypt_with_mode.explain index fc572e8fe7c6..762a4f47a058 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_aes_decrypt_with_mode.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_aes_decrypt_with_mode.explain @@ -1,2 +1,2 @@ -Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils, BinaryType, aesDecrypt, cast(g#0 as binary), cast(g#0 as binary), g#0, DEFAULT, cast( as binary), BinaryType, BinaryType, StringType, StringType, BinaryType, true, true, true) AS aes_decrypt(g, g, g, DEFAULT, )#0] +Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils, BinaryType, aesDecrypt, cast(g#0 as binary), cast(g#0 as binary), g#0, DEFAULT, cast( as binary), BinaryType, BinaryType, StringTypeAnyCollation, StringTypeAnyCollation, BinaryType, true, true, true) AS aes_decrypt(g, g, g, DEFAULT, )#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_aes_decrypt_with_mode_padding.explain b/connector/connect/com
(spark) branch master updated: [SPARK-48263] Collate function support for non UTF8_BINARY strings
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 91da2caa409c [SPARK-48263] Collate function support for non UTF8_BINARY strings 91da2caa409c is described below commit 91da2caa409cb156a970fea0fc8355fcd8c6a2e6 Author: Nebojsa Savic AuthorDate: Tue May 14 23:39:26 2024 +0800 [SPARK-48263] Collate function support for non UTF8_BINARY strings ### What changes were proposed in this pull request? collate("xx", "") does not work when there is a config for default collation set which configures non UTF8_BINARY collation as default. ### Why are the changes needed? Fixing the compatibility issue with default collation config and collate function. ### Does this PR introduce _any_ user-facing change? Customers will be able to execute collation(, ) function even when default collation config is configured to some other collation than UTF8_BINARY. We are expanding the surface area for cx. ### How was this patch tested? Added tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46574 from nebojsa-db/SPARK-48263. Authored-by: Nebojsa Savic Signed-off-by: Wenchen Fan --- .../sql/catalyst/expressions/collationExpressions.scala| 4 ++-- .../test/scala/org/apache/spark/sql/CollationSuite.scala | 14 -- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala index 6af00e193d94..7c02475a60ad 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala @@ -57,14 +57,14 @@ object CollateExpressionBuilder extends ExpressionBuilder { expressions match { case Seq(e: Expression, collationExpr: Expression) => (collationExpr.dataType, collationExpr.foldable) match { - case (StringType, true) => + case (_: StringType, true) => val evalCollation = collationExpr.eval() if (evalCollation == null) { throw QueryCompilationErrors.unexpectedNullError("collation", collationExpr) } else { Collate(e, evalCollation.toString) } - case (StringType, false) => throw QueryCompilationErrors.nonFoldableArgumentError( + case (_: StringType, false) => throw QueryCompilationErrors.nonFoldableArgumentError( funcName, "collationName", StringType) case (_, _) => throw QueryCompilationErrors.unexpectedInputDataTypeError( funcName, 1, StringType, collationExpr) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala index fce9ad3cc184..b22a762a2954 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala @@ -67,8 +67,18 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanHelper { } test("collate function syntax") { -assert(sql(s"select collate('aaa', 'utf8_binary')").schema(0).dataType == StringType(0)) -assert(sql(s"select collate('aaa', 'utf8_binary_lcase')").schema(0).dataType == StringType(1)) +assert(sql(s"select collate('aaa', 'utf8_binary')").schema(0).dataType == + StringType("UTF8_BINARY")) +assert(sql(s"select collate('aaa', 'utf8_binary_lcase')").schema(0).dataType == + StringType("UTF8_BINARY_LCASE")) + } + + test("collate function syntax with default collation set") { +withSQLConf(SqlApiConf.DEFAULT_COLLATION -> "UTF8_BINARY_LCASE") { + assert(sql(s"select collate('aaa', 'utf8_binary_lcase')").schema(0).dataType == +StringType("UTF8_BINARY_LCASE")) + assert(sql(s"select collate('aaa', 'UNICODE')").schema(0).dataType == StringType("UNICODE")) +} } test("collate function syntax invalid arg count") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47301][SQL][TESTS][FOLLOWUP] Remove workaround for ParquetIOSuite
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 97bf1ee9f6f7 [SPARK-47301][SQL][TESTS][FOLLOWUP] Remove workaround for ParquetIOSuite 97bf1ee9f6f7 is described below commit 97bf1ee9f6f76d49df50560bf792135308f289a9 Author: panbingkun AuthorDate: Tue May 14 23:37:47 2024 +0800 [SPARK-47301][SQL][TESTS][FOLLOWUP] Remove workaround for ParquetIOSuite ### What changes were proposed in this pull request? The pr aims to remove workaround for ParquetIOSuite. ### Why are the changes needed? After https://github.com/apache/spark/pull/46562 is completed, the reason why the ut `SPARK-7837 Do not close output writer twice when commitTask() fails` failed due to different event processing time sequence no longer exists, so we remove the previous workaround here. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46577 from panbingkun/SPARK-47301_FOLLOWUP. Authored-by: panbingkun Signed-off-by: Wenchen Fan --- .../spark/sql/execution/datasources/parquet/ParquetIOSuite.scala | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala index ba8fef0b3a8d..4fb8faa43a39 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala @@ -1589,12 +1589,8 @@ class ParquetIOWithoutOutputCommitCoordinationSuite .coalesce(1) df.write.partitionBy("a").options(extraOptions).parquet(dir.getCanonicalPath) } -if (m2.getErrorClass != null) { - assert(m2.getErrorClass == "TASK_WRITE_FAILED") - assert(m2.getCause.getMessage.contains("Intentional exception for testing purposes")) -} else { - assert(m2.getMessage.contains("TASK_WRITE_FAILED")) -} +assert(m2.getErrorClass == "TASK_WRITE_FAILED") +assert(m2.getCause.getMessage.contains("Intentional exception for testing purposes")) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new a848e2790cba [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects a848e2790cba is described below commit a848e2790cba0b7ee77d391dc534146bd35ee50a Author: Mihailo Milosevic AuthorDate: Tue May 14 23:31:46 2024 +0800 [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects Special case escaping for MySQL and fix issues with redundant escaping for ' character. When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\' syntax instead of ESCAPE '\' which would cause errors when trying to push down. Yes Tests for each existing dialect. No. Closes #46437 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic Signed-off-by: Wenchen Fan (cherry picked from commit 47006a493f98ca85196194d16d58b5847177b1a3) Signed-off-by: Wenchen Fan --- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 6 + .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 11 + .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 6 + .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 + .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 229 + .../sql/connector/util/V2ExpressionSQLBuilder.java | 3 - .../sql/connector/expressions/expressions.scala| 4 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 7 - .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 15 ++ .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 6 +- 12 files changed, 291 insertions(+), 14 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 1a25cd2802dd..11ddce68aecd 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala @@ -67,6 +67,12 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest { connection.prepareStatement( "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary DECIMAL(20, 2), bonus DOUBLE)") .executeUpdate() +connection.prepareStatement( + s"""CREATE TABLE pattern_testing_table ( + |pattern_testing_col LONGTEXT + |) + """.stripMargin +).executeUpdate() } override def testUpdateColumnType(tbl: String): Unit = { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index 72edfc9f1bf1..a42caeafe6fe 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -38,6 +38,17 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { .executeUpdate() connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() + +connection.prepareStatement( + s""" + |INSERT INTO pattern_testing_table VALUES + |('special_character_quote\\'_present'), + |('special_character_quote_not_present'), + |('special_character_percent%_present'), + |('special_character_percent_not_present'), + |('special_character_underscore_present'), + |('special_character_underscorenot_present') + """.stripMargin).executeUpdate() } def tablePreparation(connection: Connection): Unit diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala index a527c6f8cb5b..6658b5ed6c77 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala @@ -66,6 +66,12 @@ class MsSqlServerIntegrationSuite extends DockerJ
(spark) branch branch-3.5 updated: [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new f37fa436cd4e [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects f37fa436cd4e is described below commit f37fa436cd4e0ef9f486a60f9af91a3ce0195df9 Author: Mihailo Milosevic AuthorDate: Tue May 14 23:31:46 2024 +0800 [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects Special case escaping for MySQL and fix issues with redundant escaping for ' character. When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\' syntax instead of ESCAPE '\' which would cause errors when trying to push down. Yes Tests for each existing dialect. No. Closes #46437 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic Signed-off-by: Wenchen Fan (cherry picked from commit 47006a493f98ca85196194d16d58b5847177b1a3) Signed-off-by: Wenchen Fan --- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 6 + .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 11 + .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 6 + .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 + .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 229 + .../sql/connector/util/V2ExpressionSQLBuilder.java | 3 - .../sql/connector/expressions/expressions.scala| 4 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 7 - .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 15 ++ .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 6 +- 12 files changed, 291 insertions(+), 14 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 9a78244f5326..9b4916ddd36b 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala @@ -80,6 +80,12 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest { connection.prepareStatement( "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary DECIMAL(20, 2), bonus DOUBLE)") .executeUpdate() +connection.prepareStatement( + s"""CREATE TABLE pattern_testing_table ( + |pattern_testing_col LONGTEXT + |) + """.stripMargin +).executeUpdate() } override def testUpdateColumnType(tbl: String): Unit = { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index 72edfc9f1bf1..a42caeafe6fe 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -38,6 +38,17 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { .executeUpdate() connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() + +connection.prepareStatement( + s""" + |INSERT INTO pattern_testing_table VALUES + |('special_character_quote\\'_present'), + |('special_character_quote_not_present'), + |('special_character_percent%_present'), + |('special_character_percent_not_present'), + |('special_character_underscore_present'), + |('special_character_underscorenot_present') + """.stripMargin).executeUpdate() } def tablePreparation(connection: Connection): Unit diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala index 0dc3a39f4db5..57a2667557fa 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala @@ -86,6 +86,12 @@ class MsSqlServerIntegrationSuite extends DockerJ
(spark) branch master updated: [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 47006a493f98 [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects 47006a493f98 is described below commit 47006a493f98ca85196194d16d58b5847177b1a3 Author: Mihailo Milosevic AuthorDate: Tue May 14 23:31:46 2024 +0800 [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects ### What changes were proposed in this pull request? Special case escaping for MySQL and fix issues with redundant escaping for ' character. ### Why are the changes needed? When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\' syntax instead of ESCAPE '\' which would cause errors when trying to push down. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Tests for each existing dialect. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46437 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic Signed-off-by: Wenchen Fan --- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 6 + .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 11 + .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 6 + .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 6 + .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 + .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 229 + .../sql/connector/util/V2ExpressionSQLBuilder.java | 1 - .../sql/connector/expressions/expressions.scala| 4 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 7 - .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 15 ++ .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 6 +- 12 files changed, 291 insertions(+), 12 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 3642094d11b2..36795747319d 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala @@ -62,6 +62,12 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest { connection.prepareStatement( "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary DECIMAL(20, 2), bonus DOUBLE)") .executeUpdate() +connection.prepareStatement( + s"""CREATE TABLE pattern_testing_table ( + |pattern_testing_col LONGTEXT + |) + """.stripMargin +).executeUpdate() } override def testUpdateColumnType(tbl: String): Unit = { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index 72edfc9f1bf1..a42caeafe6fe 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -38,6 +38,17 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { .executeUpdate() connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() + +connection.prepareStatement( + s""" + |INSERT INTO pattern_testing_table VALUES + |('special_character_quote\\'_present'), + |('special_character_quote_not_present'), + |('special_character_percent%_present'), + |('special_character_percent_not_present'), + |('special_character_underscore_present'), + |('special_character_underscorenot_present') + """.stripMargin).executeUpdate() } def tablePreparation(connection: Connection): Unit diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala index b1b8aec5ad33..46530fe5419a 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala +++ b/connector/docker-integration-tes
(spark) branch master updated: [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e5ad5e94a8c8 [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec e5ad5e94a8c8 is described below commit e5ad5e94a8c891210637084a69359c1364201653 Author: Angerszh AuthorDate: Tue May 14 17:32:56 2024 +0800 [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec ### What changes were proposed in this pull request? It's a new approach to fix [SPARK-39551](https://issues.apache.org/jira/browse/SPARK-39551) This situation happened for AQEPropagateEmptyRelation when one side is empty and one side is BroadcastQueryStateExec This pr avoid do propagate, not to revert all queryStagePreparationRules's result. ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manuel tested `SPARK-39551: Invalid plan check - invalid broadcast query stage`, it can work well without origin fix and current pr For added UT, ``` test("SPARK-48155: AQEPropagateEmptyRelation check remained child for join") { withSQLConf( SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") { val (_, adaptivePlan) = runAdaptiveAndVerifyResult( """ |SELECT /*+ BROADCAST(t3) */ t3.b, count(t3.a) FROM testData2 t1 |INNER JOIN ( | SELECT * FROM testData2 | WHERE b = 0 | UNION ALL | SELECT * FROM testData2 | WHErE b != 0 |) t2 |ON t1.b = t2.b AND t1.a = 0 |RIGHT OUTER JOIN testData2 t3 |ON t1.a > t3.a |GROUP BY t3.b """.stripMargin ) assert(findTopLevelBroadcastNestedLoopJoin(adaptivePlan).size == 1) assert(findTopLevelUnion(adaptivePlan).size == 0) } } ``` before this pr the adaptive plan is ``` *(9) HashAggregate(keys=[b#226], functions=[count(1)], output=[b#226, count(a)#228L]) +- AQEShuffleRead coalesced +- ShuffleQueryStage 3 +- Exchange hashpartitioning(b#226, 5), ENSURE_REQUIREMENTS, [plan_id=356] +- *(8) HashAggregate(keys=[b#226], functions=[partial_count(1)], output=[b#226, count#232L]) +- *(8) Project [b#226] +- BroadcastNestedLoopJoin BuildRight, RightOuter, (a#23 > a#225) :- *(7) Project [a#23] : +- *(7) SortMergeJoin [b#24], [b#220], Inner : :- *(5) Sort [b#24 ASC NULLS FIRST], false, 0 : : +- AQEShuffleRead coalesced : : +- ShuffleQueryStage 0 : :+- Exchange hashpartitioning(b#24, 5), ENSURE_REQUIREMENTS, [plan_id=211] : : +- *(1) Filter (a#23 = 0) : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24] : : +- Scan[obj#22] : +- *(6) Sort [b#220 ASC NULLS FIRST], false, 0 :+- AQEShuffleRead coalesced : +- ShuffleQueryStage 1 : +- Exchange hashpartitioning(b#220, 5), ENSURE_REQUIREMENTS, [plan_id=233] : +- Union ::- *(2) Project [b#220] :: +- *(2) Filter (b#220 = 0) :: +- *(2) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#219, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#220] ::+- Scan[obj#218] :+- *(3) Project [b#223] : +- *(3) Filter NOT (b#223 = 0) : +- *(3) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#222, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#223] : +-
(spark) branch master updated: [SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through aggregates
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6766c39b458a [SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through aggregates 6766c39b458a is described below commit 6766c39b458ad7abacd1a5b11c896efabf36f95c Author: zml1206 AuthorDate: Tue May 14 15:53:43 2024 +0800 [SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through aggregates ### What changes were proposed in this pull request? Push down throwable predicate through aggregates and add ut for "can't push down nondeterministic filter through aggregate". ### Why are the changes needed? If we can push down a filter through Aggregate, it means the filter only references the grouping keys. The Aggregate operator can't reduce grouping keys so the filter won't see any new data after pushing down. So push down throwable filter through aggregate does not affect exception thrown. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? UT ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44975 from zml1206/SPARK-46707-FOLLOWUP. Authored-by: zml1206 Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/optimizer/Optimizer.scala | 8 ++-- .../sql/catalyst/optimizer/FilterPushdownSuite.scala | 19 --- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index dfc1e17c2a29..4ee6d9027a9c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -1768,6 +1768,10 @@ object PushPredicateThroughNonJoin extends Rule[LogicalPlan] with PredicateHelpe val aliasMap = getAliasMap(project) project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild)) +// We can push down deterministic predicate through Aggregate, including throwable predicate. +// If we can push down a filter through Aggregate, it means the filter only references the +// grouping keys or constants. The Aggregate operator can't reduce distinct values of grouping +// keys so the filter won't see any new data after push down. case filter @ Filter(condition, aggregate: Aggregate) if aggregate.aggregateExpressions.forall(_.deterministic) && aggregate.groupingExpressions.nonEmpty => @@ -1777,8 +1781,8 @@ object PushPredicateThroughNonJoin extends Rule[LogicalPlan] with PredicateHelpe // attributes produced by the aggregate operator's child operator. val (pushDown, stayUp) = splitConjunctivePredicates(condition).partition { cond => val replaced = replaceAlias(cond, aliasMap) -cond.deterministic && !cond.throwable && - cond.references.nonEmpty && replaced.references.subsetOf(aggregate.child.outputSet) +cond.deterministic && cond.references.nonEmpty && + replaced.references.subsetOf(aggregate.child.outputSet) } if (pushDown.nonEmpty) { diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala index 03e65412d166..5027222be6b8 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala @@ -219,6 +219,17 @@ class FilterPushdownSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("Can't push down nondeterministic filter through aggregate") { +val originalQuery = testRelation + .groupBy($"a")($"a", count($"b") as "c") + .where(Rand(10) > $"a") + .analyze + +val optimized = Optimize.execute(originalQuery) + +comparePlans(optimized, originalQuery) + } + test("filters: combines filters") { val originalQuery = testRelation .select($"a") @@ -1483,14 +1494,16 @@ class FilterPushdownSuite extends PlanTest { test("SPARK-46707: push down predicate with sequence (without step) through aggregates") { val x = testRelation.subquery("x") -// do not push down when sequence has step param +// Always push down sequence as it's deterministic val queryWithStep = x.groupBy($"x.a", $"x.b"
(spark) branch master updated: [SPARK-48157][SQL] Add collation support for CSV expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e6c914f63079 [SPARK-48157][SQL] Add collation support for CSV expressions e6c914f63079 is described below commit e6c914f630793992eba7a409ec2cd061f385ce02 Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Tue May 14 14:17:45 2024 +0800 [SPARK-48157][SQL] Add collation support for CSV expressions ### What changes were proposed in this pull request? Introduce collation awareness for CSV expressions: from_csv, schema_of_csv, to_csv. ### Why are the changes needed? Add collation support for CSV expressions in Spark. ### Does this PR introduce _any_ user-facing change? Yes, users should now be able to use collated strings within arguments for CSV functions: from_csv, schema_of_csv, to_csv. ### How was this patch tested? E2e sql tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46504 from uros-db/csv-expressions. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../sql/catalyst/expressions/csvExpressions.scala | 7 +- .../spark/sql/CollationSQLExpressionsSuite.scala | 112 + 2 files changed, 116 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala index 4714fc1ded9c..cb10440c4832 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala @@ -31,6 +31,7 @@ import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.catalyst.util.TypeUtils._ import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryErrorsBase} import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.types.StringTypeAnyCollation import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String @@ -146,7 +147,7 @@ case class CsvToStructs( converter(parser.parse(csv)) } - override def inputTypes: Seq[AbstractDataType] = StringType :: Nil + override def inputTypes: Seq[AbstractDataType] = StringTypeAnyCollation :: Nil override def prettyName: String = "from_csv" @@ -177,7 +178,7 @@ case class SchemaOfCsv( child = child, options = ExprUtils.convertToMapData(options)) - override def dataType: DataType = StringType + override def dataType: DataType = SQLConf.get.defaultStringType override def nullable: Boolean = false @@ -300,7 +301,7 @@ case class StructsToCsv( (row: Any) => UTF8String.fromString(gen.writeToString(row.asInstanceOf[InternalRow])) } - override def dataType: DataType = StringType + override def dataType: DataType = SQLConf.get.defaultStringType override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = copy(timeZoneId = Option(timeZoneId)) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala index 22b29154cd78..f8b3548b956c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala @@ -313,6 +313,118 @@ class CollationSQLExpressionsSuite }) } + test("Support CsvToStructs csv expression with collation") { +case class CsvToStructsTestCase( + input: String, + collationName: String, + schema: String, + options: String, + result: Row, + structFields: Seq[StructField] +) + +val testCases = Seq( + CsvToStructsTestCase("1", "UTF8_BINARY", "'a INT'", "", +Row(1), Seq( + StructField("a", IntegerType, nullable = true) +)), + CsvToStructsTestCase("true, 0.8", "UTF8_BINARY_LCASE", "'A BOOLEAN, B DOUBLE'", "", +Row(true, 0.8), Seq( + StructField("A", BooleanType, nullable = true), + StructField("B", DoubleType, nullable = true) +)), + CsvToStructsTestCase("\"Spark\"", "UNICODE", "'a STRING'", "", +Row("Spark"), Seq( + StructField("a", StringType("UNICODE"), nullable = true) +)), + CsvToStructsTestCase("26/08/2015", "UTF8_BINARY", "'time Timestamp'", +
(spark) branch master updated: [SPARK-48229][SQL] Add collation support for inputFile expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9241b8e8c0df [SPARK-48229][SQL] Add collation support for inputFile expressions 9241b8e8c0df is described below commit 9241b8e8c0dfe35fbe1631fd440527eb72d88de8 Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Tue May 14 14:08:30 2024 +0800 [SPARK-48229][SQL] Add collation support for inputFile expressions ### What changes were proposed in this pull request? Introduce collation awareness for inputFile expressions: input_file_name. ### Why are the changes needed? Add collation support for inputFile expressions in Spark. ### Does this PR introduce _any_ user-facing change? Yes, users should now be able to use collated strings within arguments for inputFile functions: input_file_name. ### How was this patch tested? E2e sql tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46503 from uros-db/input-file-block. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/expressions/inputFileBlock.scala | 5 +++-- .../apache/spark/sql/CollationSQLExpressionsSuite.scala | 17 + 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala index 6cd88367aa9a..65eb995ff32f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala @@ -21,7 +21,8 @@ import org.apache.spark.rdd.InputFileBlockHolder import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGenerator, ExprCode, FalseLiteral} import org.apache.spark.sql.catalyst.expressions.codegen.Block._ -import org.apache.spark.sql.types.{DataType, LongType, StringType} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{DataType, LongType} import org.apache.spark.unsafe.types.UTF8String // scalastyle:off whitespace.end.of.line @@ -39,7 +40,7 @@ case class InputFileName() extends LeafExpression with Nondeterministic { override def nullable: Boolean = false - override def dataType: DataType = StringType + override def dataType: DataType = SQLConf.get.defaultStringType override def prettyName: String = "input_file_name" diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala index dd5703d1284a..22b29154cd78 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala @@ -1275,6 +1275,23 @@ class CollationSQLExpressionsSuite }) } + test("Support InputFileName expression with collation") { +// Supported collations +Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", "UNICODE_CI").foreach(collationName => { + val query = +s""" + |select input_file_name() + |""".stripMargin + // Result + withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) { +val testQuery = sql(query) +checkAnswer(testQuery, Row("")) +val dataType = StringType(collationName) +assert(testQuery.schema.fields.head.dataType.sameType(dataType)) + } +}) + } + // TODO: Add more tests for other SQL expressions } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48265][SQL] Infer window group limit batch should do constant folding
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 34588a82239a [SPARK-48265][SQL] Infer window group limit batch should do constant folding 34588a82239a is described below commit 34588a82239a5c12fefed13e271edd963b821b1c Author: Angerszh AuthorDate: Tue May 14 13:44:47 2024 +0800 [SPARK-48265][SQL] Infer window group limit batch should do constant folding ### What changes were proposed in this pull request? Plan after PropagateEmptyRelation may generate double local limit ``` GlobalLimit 21 +- LocalLimit 21 ! +- Union false, false ! :- LocalLimit 21 ! : +- Project [item_id#647L] ! : +- Filter () ! :+- Relation db.table[,... 91 more fields] parquet ! +- LocalLimit 21 ! +- Project [item_id#738L] !+- LocalRelation , [, ... 91 more fields] ``` to ``` GlobalLimit 21 +- LocalLimit 21 - LocalLimit 21 +- Project [item_id#647L] +- Filter () +- Relation db.table[,... 91 more fields] parquet ``` after `Infer window group limit batch` batch's `EliminateLimits` will be ``` GlobalLimit 21 +- LocalLimit least(21, 21) +- Project [item_id#647L] +- Filter () +- Relation db.table[,... 91 more fields] parquet ``` It can't work, here miss a `ConstantFolding` ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No Closes #46568 from AngersZh/SPARK-48265. Authored-by: Angerszh Signed-off-by: Wenchen Fan (cherry picked from commit 7974811218c9fb52ac9d07f8983475a885ada81b) Signed-off-by: Wenchen Fan --- .../src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala index 70a35ea91153..6173703ef3cd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala @@ -89,7 +89,8 @@ class SparkOptimizer( InferWindowGroupLimit, LimitPushDown, LimitPushDownThroughWindow, - EliminateLimits) :+ + EliminateLimits, + ConstantFolding) :+ Batch("User Provided Optimizers", fixedPoint, experimentalMethods.extraOptimizations: _*) :+ Batch("Replace CTE with Repartition", Once, ReplaceCTERefWithRepartition) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48265][SQL] Infer window group limit batch should do constant folding
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7974811218c9 [SPARK-48265][SQL] Infer window group limit batch should do constant folding 7974811218c9 is described below commit 7974811218c9fb52ac9d07f8983475a885ada81b Author: Angerszh AuthorDate: Tue May 14 13:44:47 2024 +0800 [SPARK-48265][SQL] Infer window group limit batch should do constant folding ### What changes were proposed in this pull request? Plan after PropagateEmptyRelation may generate double local limit ``` GlobalLimit 21 +- LocalLimit 21 ! +- Union false, false ! :- LocalLimit 21 ! : +- Project [item_id#647L] ! : +- Filter () ! :+- Relation db.table[,... 91 more fields] parquet ! +- LocalLimit 21 ! +- Project [item_id#738L] !+- LocalRelation , [, ... 91 more fields] ``` to ``` GlobalLimit 21 +- LocalLimit 21 - LocalLimit 21 +- Project [item_id#647L] +- Filter () +- Relation db.table[,... 91 more fields] parquet ``` after `Infer window group limit batch` batch's `EliminateLimits` will be ``` GlobalLimit 21 +- LocalLimit least(21, 21) +- Project [item_id#647L] +- Filter () +- Relation db.table[,... 91 more fields] parquet ``` It can't work, here miss a `ConstantFolding` ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No Closes #46568 from AngersZh/SPARK-48265. Authored-by: Angerszh Signed-off-by: Wenchen Fan --- .../src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala index 70a35ea91153..6173703ef3cd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala @@ -89,7 +89,8 @@ class SparkOptimizer( InferWindowGroupLimit, LimitPushDown, LimitPushDownThroughWindow, - EliminateLimits) :+ + EliminateLimits, + ConstantFolding) :+ Batch("User Provided Optimizers", fixedPoint, experimentalMethods.extraOptimizations: _*) :+ Batch("Replace CTE with Repartition", Once, ReplaceCTERefWithRepartition) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48027][SQL][FOLLOWUP] Add comments for the other code branch
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0ea808880e22 [SPARK-48027][SQL][FOLLOWUP] Add comments for the other code branch 0ea808880e22 is described below commit 0ea808880e22e2b6cc97a3e946123bec035ade93 Author: beliefer AuthorDate: Tue May 14 13:26:17 2024 +0800 [SPARK-48027][SQL][FOLLOWUP] Add comments for the other code branch ### What changes were proposed in this pull request? This PR propose to add comments for the other code branch. ### Why are the changes needed? https://github.com/apache/spark/pull/46263 missing the comments for the other code branch. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #46536 from beliefer/SPARK-48027_followup. Authored-by: beliefer Signed-off-by: Wenchen Fan --- .../catalyst/optimizer/InjectRuntimeFilter.scala| 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala index 3bb7c4d1ceca..176e927b2d21 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala @@ -123,21 +123,20 @@ object InjectRuntimeFilter extends Rule[LogicalPlan] with PredicateHelper with J case ExtractEquiJoinKeys(joinType, lkeys, rkeys, _, _, left, right, _) => // Runtime filters use one side of the [[Join]] to build a set of join key values and prune // the other side of the [[Join]]. It's also OK to use a superset of the join key values -// (ignore null values) to do the pruning. +// (ignore null values) to do the pruning. We can also extract from the other side if the +// join keys are transitive, and the other side always produces a superset output of join +// key values. Any join side always produce a superset output of its corresponding +// join keys, but for transitive join keys we need to check the join type. // We assume other rules have already pushed predicates through join if possible. // So the predicate references won't pass on anymore. if (left.output.exists(_.semanticEquals(targetKey))) { extract(left, AttributeSet.empty, hasHitFilter = false, hasHitSelectiveFilter = false, currentPlan = left, targetKey = targetKey).orElse { -// We can also extract from the right side if the join keys are transitive, and -// the right side always produces a superset output of join left keys. -// Let's look at an example +// An example that extract from the right side if the join keys are transitive. // left table: 1, 2, 3 // right table, 3, 4 -// left outer join output: (1, null), (2, null), (3, 3) -// left key output: 1, 2, 3 -// Any join side always produce a superset output of its corresponding -// join keys, but for transitive join keys we need to check the join type. +// right outer join output: (3, 3), (null, 4) +// right key output: 3, 4 if (canPruneLeft(joinType)) { lkeys.zip(rkeys).find(_._1.semanticEquals(targetKey)).map(_._2) .flatMap { newTargetKey => @@ -152,7 +151,11 @@ object InjectRuntimeFilter extends Rule[LogicalPlan] with PredicateHelper with J } else if (right.output.exists(_.semanticEquals(targetKey))) { extract(right, AttributeSet.empty, hasHitFilter = false, hasHitSelectiveFilter = false, currentPlan = right, targetKey = targetKey).orElse { -// We can also extract from the left side if the join keys are transitive. +// An example that extract from the left side if the join keys are transitive. +// left table: 1, 2, 3 +// right table, 3, 4 +// left outer join output: (1, null), (2, null), (3, 3) +// left key output: 1, 2, 3 if (canPruneRight(joinType)) { rkeys.zip(lkeys).find(_._1.semanticEquals(targetKey)).map(_._2) .flatMap { newTargetKey => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48241][SQL][3.5] CSV parsing failure with char/varchar type columns
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 19d12b249f0f [SPARK-48241][SQL][3.5] CSV parsing failure with char/varchar type columns 19d12b249f0f is described below commit 19d12b249f0fe4cb5b20b9722188c5a850147cec Author: joey.ljy AuthorDate: Tue May 14 13:06:57 2024 +0800 [SPARK-48241][SQL][3.5] CSV parsing failure with char/varchar type columns ### What changes were proposed in this pull request? CSV table containing char and varchar columns will result in the following error when selecting from the CSV table: ``` spark-sql (default)> show create table test_csv; CREATE TABLE default.test_csv ( id INT, name CHAR(10)) USING csv ``` ``` java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct) should be the subset of dataSchema (struct). at scala.Predef$.require(Predef.scala:281) at org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125) ``` ### Why are the changes needed? For char and varchar types, Spark will convert them to `StringType` in `CharVarcharUtils.replaceCharVarcharWithStringInSchema` and record `__CHAR_VARCHAR_TYPE_STRING` in the metadata. The reason for the above error is that the `StringType` columns in the `dataSchema` and `requiredSchema` of `UnivocityParser` are not consistent. The `StringType` in the `dataSchema` has metadata, while the metadata in the `requiredSchema` is empty. We need to retain the metadata when resolving schema. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add a new test case in `CSVSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46565 from liujiayi771/branch-3.5-SPARK-48241. Authored-by: joey.ljy Signed-off-by: Wenchen Fan --- .../sql/catalyst/plans/logical/LogicalPlan.scala | 4 +++- sql/core/src/test/resources/test-data/char.csv | 4 .../sql/execution/datasources/csv/CSVSuite.scala | 24 ++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala index 374eb070db1c..7fe8bd356ea9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala @@ -116,7 +116,9 @@ abstract class LogicalPlan def resolve(schema: StructType, resolver: Resolver): Seq[Attribute] = { schema.map { field => resolve(field.name :: Nil, resolver).map { -case a: AttributeReference => a +case a: AttributeReference => + // Keep the metadata in given schema. + a.withMetadata(field.metadata) case _ => throw QueryExecutionErrors.resolveCannotHandleNestedSchema(this) }.getOrElse { throw QueryCompilationErrors.cannotResolveAttributeError( diff --git a/sql/core/src/test/resources/test-data/char.csv b/sql/core/src/test/resources/test-data/char.csv new file mode 100644 index ..d2be68a15fc1 --- /dev/null +++ b/sql/core/src/test/resources/test-data/char.csv @@ -0,0 +1,4 @@ +color,name +pink,Bob +blue,Mike +grey,Tom diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala index a91adb787838..3762c00ff1a1 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala @@ -80,6 +80,7 @@ abstract class CSVSuite private val valueMalformedFile = "test-data/value-malformed.csv" private val badAfterGoodFile = "test-data/bad_after_good.csv" privat
(spark) branch master updated: [SPARK-48241][SQL] CSV parsing failure with char/varchar type columns
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b14abb3a2ed0 [SPARK-48241][SQL] CSV parsing failure with char/varchar type columns b14abb3a2ed0 is described below commit b14abb3a2ed086d2ff8f340f60c0dc1e460c7a59 Author: joey.ljy AuthorDate: Mon May 13 22:42:31 2024 +0800 [SPARK-48241][SQL] CSV parsing failure with char/varchar type columns ### What changes were proposed in this pull request? CSV table containing char and varchar columns will result in the following error when selecting from the CSV table: ``` spark-sql (default)> show create table test_csv; CREATE TABLE default.test_csv ( id INT, name CHAR(10)) USING csv ``` ``` java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct) should be the subset of dataSchema (struct). at scala.Predef$.require(Predef.scala:281) at org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125) ``` ### Why are the changes needed? For char and varchar types, Spark will convert them to `StringType` in `CharVarcharUtils.replaceCharVarcharWithStringInSchema` and record `__CHAR_VARCHAR_TYPE_STRING` in the metadata. The reason for the above error is that the `StringType` columns in the `dataSchema` and `requiredSchema` of `UnivocityParser` are not consistent. The `StringType` in the `dataSchema` has metadata, while the metadata in the `requiredSchema` is empty. We need to retain the metadata when resolving schema. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add a new test case in `CSVSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46537 from liujiayi771/csv-char. Authored-by: joey.ljy Signed-off-by: Wenchen Fan --- .../sql/catalyst/plans/logical/LogicalPlan.scala | 4 +++- sql/core/src/test/resources/test-data/char.csv | 4 .../sql/execution/datasources/csv/CSVSuite.scala | 24 ++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala index b989233da674..98e91585c2a0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala @@ -118,7 +118,9 @@ abstract class LogicalPlan def resolve(schema: StructType, resolver: Resolver): Seq[Attribute] = { schema.map { field => resolve(field.name :: Nil, resolver).map { -case a: AttributeReference => a +case a: AttributeReference => + // Keep the metadata in given schema. + a.withMetadata(field.metadata) case _ => throw QueryExecutionErrors.resolveCannotHandleNestedSchema(this) }.getOrElse { throw QueryCompilationErrors.cannotResolveAttributeError( diff --git a/sql/core/src/test/resources/test-data/char.csv b/sql/core/src/test/resources/test-data/char.csv new file mode 100644 index ..d2be68a15fc1 --- /dev/null +++ b/sql/core/src/test/resources/test-data/char.csv @@ -0,0 +1,4 @@ +color,name +pink,Bob +blue,Mike +grey,Tom diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala index 22ea133ee19a..0e58b96531da 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala @@ -80,6 +80,7 @@ abstract class CSVSuite private val valueMalformedFile = "test-data/value-malformed.csv" private val badAfterGoodFile = "test-data/bad_after_good.csv" private val malformedRowFile = "test-data/m
(spark) branch master updated (42f2132d1fc9 -> 3456d4f69a86)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 42f2132d1fc9 [SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteWithExpression add 3456d4f69a86 [SPARK-47681][FOLLOWUP] Fix schema_of_variant(decimal) No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/variant/variantExpressions.scala | 7 +++ .../test/scala/org/apache/spark/sql/VariantEndToEndSuite.scala | 10 ++ 2 files changed, 13 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteWithExpression
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 42f2132d1fc9 [SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteWithExpression 42f2132d1fc9 is described below commit 42f2132d1fc99bf2ec5bd398d21dcbdbd5cbde47 Author: Kelvin Jiang AuthorDate: Mon May 13 22:28:27 2024 +0800 [SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteWithExpression ### What changes were proposed in this pull request? This PR adds more testing for `RewriteWithExpression` around `Window` operators. ### Why are the changes needed? Adds more testing for `RewriteWithExpression`, which can be fragile around `WindowExpressions`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46492 from kelvinjian-db/SPARK-48206-window. Authored-by: Kelvin Jiang Signed-off-by: Wenchen Fan --- .../optimizer/RewriteWithExpressionSuite.scala | 223 + 1 file changed, 135 insertions(+), 88 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala index 8f023fa4156b..aa8ffb2b0454 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala @@ -24,7 +24,6 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} import org.apache.spark.sql.catalyst.rules.RuleExecutor -import org.apache.spark.sql.types.IntegerType class RewriteWithExpressionSuite extends PlanTest { @@ -37,6 +36,20 @@ class RewriteWithExpressionSuite extends PlanTest { private val testRelation = LocalRelation($"a".int, $"b".int) private val testRelation2 = LocalRelation($"x".int, $"y".int) + private def normalizeCommonExpressionIds(plan: LogicalPlan): LogicalPlan = { +plan.transformAllExpressions { + case a: Alias if a.name.startsWith("_common_expr") => +a.withName("_common_expr_0") + case a: AttributeReference if a.name.startsWith("_common_expr") => +a.withName("_common_expr_0") +} + } + + override def comparePlans( +plan1: LogicalPlan, plan2: LogicalPlan, checkAnalysis: Boolean = true): Unit = { +super.comparePlans(normalizeCommonExpressionIds(plan1), normalizeCommonExpressionIds(plan2)) + } + test("simple common expression") { val a = testRelation.output.head val expr = With(a) { case Seq(ref) => @@ -52,65 +65,48 @@ class RewriteWithExpressionSuite extends PlanTest { ref * ref } val plan = testRelation.select(expr.as("col")) -val commonExprId = expr.defs.head.id.id -val commonExprName = s"_common_expr_$commonExprId" comparePlans( Optimizer.execute(plan), testRelation -.select((testRelation.output :+ (a + a).as(commonExprName)): _*) -.select(($"$commonExprName" * $"$commonExprName").as("col")) +.select((testRelation.output :+ (a + a).as("_common_expr_0")): _*) +.select(($"_common_expr_0" * $"_common_expr_0").as("col")) .analyze ) } test("nested WITH expression in the definition expression") { -val a = testRelation.output.head +val Seq(a, b) = testRelation.output val innerExpr = With(a + a) { case Seq(ref) => ref + ref } -val innerCommonExprId = innerExpr.defs.head.id.id -val innerCommonExprName = s"_common_expr_$innerCommonExprId" - -val b = testRelation.output.last val outerExpr = With(innerExpr + b) { case Seq(ref) => ref * ref } -val outerCommonExprId = outerExpr.defs.head.id.id -val outerCommonExprName = s"_common_expr_$outerCommonExprId" val plan = testRelation.select(outerExpr.as("col")) -val rewrittenOuterExpr = ($"$innerCommonExprName" + $"$innerCommonExprName" + b) - .as(outerCommonExprName) -val outerExprAttr = AttributeReference(outerCommonExprName, IntegerType)( - exprId = rewrittenOuterExpr.exprId) comparePlans( Optimizer.execute(plan), testRelation -.selec
svn commit: r69098 - /dev/spark/v4.0.0-preview1-rc1-bin/
Author: wenchen Date: Sat May 11 04:28:26 2024 New Revision: 69098 Log: Apache Spark v4.0.0-preview1-rc1 Added: dev/spark/v4.0.0-preview1-rc1-bin/ dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz.asc dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.asc dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc1-bin/pyspark-connect-4.0.0.dev1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-hadoop3.tgz (with props) dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.asc dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.sha512 dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz (with props) dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.asc dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.sha512 dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1.tgz (with props) dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1.tgz.asc dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1.tgz.sha512 Added: dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz.asc Sat May 11 04:28:26 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmY+8UQTHHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/WkV1D/44BoMRwBQPQybc9ldlemMhKNQ/1OLB +mUwhLpeUryOpUjO8AXa60YBajHqg9hivRxAUiuoaBSn7HjWY+3+nwkbcA7ZyMaV2 +Hgvfu4orB2kYXx4JgiE+dd2Zbuq+HFTv32dDUe+FyiHvhFw/bL0TIYUNJfKNcBtq +KZDl9K5wemNjmpUSQAfEh3/vkikv5xOGxV+yEohgpB3t5Wg3hTETISXLfx/mHDu5 +GPjdCZ1omcqxZsV16CFZHV/uzK5aEDXfPdo2OO5V94xyQL0EQaMnzzMUdHkxPJ3p +747tTf/q5rXHOb7S67MtNoBZ8myR23mQGJTwlV6E8CJWcbH7R6SEHekG9kIPGd3i +UHoBAmroi+KfAdRej2Nqvz7SfeDeAmFw2kBRIm42FYWIqalAqbKU9LlXSpjyvYkO +82df+5mwOzJf5VSU9D3krmjqWMFdjlLbDI1O1hLMNHyZkCYzPf+pmFhABsfGMXZH +D8vURqF5aL9BmEuwi1SF0zSa9bI0otQj0DBvCbZnUeULSHB+P/eFqHoXjtNX2ArB +43zmyaDywfqPXoMItvb+sGGUvatbLTCjjl6yfwgZEKOHs5noCygmL1WoLVQV+UYe +UXb/hOJrP4FdUARpnMmz6R0NYSgQ7RZ7lOjQqs3VB7W1ashh0EWDD1hbeqMpvdx/ ++fBbOLMrdzxifw== +=2il7 +-END PGP SIGNATURE- Added: dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz.sha512 == --- dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz.sha512 (added) +++ dev/spark/v4.0.0-preview1-rc1-bin/SparkR_4.0.0-preview1.tar.gz.sha512 Sat May 11 04:28:26 2024 @@ -0,0 +1 @@ +60c0f5348da36d3399b596648e104202b2e9925a5b52694bf83cec9be1b4e78db6b4aa7f2f9257bca74dd514dca176ab6b51ab4c0abad2b31fb3fc5b5c14 SparkR_4.0.0-preview1.tar.gz Added: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.asc Sat May 11 04:28:26 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmY+8UYTHHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/WsnjD/4m0Dyb8ZcxS/JScvFxl3eg7KRWi8d8 +bGHs/pHZxdwS/HUkBRtv0w6HXJV6ZtQW1CPtbZ0VKOqElUfGPS/VaxE91I7c2Vmb ++/P2/buVX6fBlF+vIUPECyVgblnhBeZKbBb5Wcz3xpL1Jfj/6qi3o9uLnFFfy55S +N6FWIJ5xrjl9mlo6+s4qqL/06u982NaEyUsu51eNgapTQcNUAjFKme13WC3W7n0S +i6ixtW1oXmfY74CzSfn6KNC+5QvxKwJznS7ZxrG3g/chcaR8rApUZ526v4XL7LP0 +BDNeqCI+blAjVYFUzBIkvZp8SR/BbJv2HSySq5hbf0S6l0O+iuj8tZ/oa8Z0hCNf +lXUw2ORG7RJKUZePdC+F+vYrmISyDRiWb4ddSUAjkzXy8KEWw6y55VULCq4vHbDc +1Zwmf2izaujavcSJMjBnMhoZZ1PBlxgVQwHYu0Pi3qLCxyIn4oTd1wW7h6u5IGMr ++1LjMaGCrKbWSafp+cXGtzfJGjzPjCdIN2HqX6l53Vli4jn8I8yGJZs7hp+SZ281 +QBmzgiDLWUdQf+72bGNNlvy1FliPg0k7
svn commit: r69097 - /dev/spark/v4.0.0-preview1-rc1-bin/
Author: wenchen Date: Sat May 11 03:59:33 2024 New Revision: 69097 Log: prepare for re-uploading Removed: dev/spark/v4.0.0-preview1-rc1-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r69092 - in /dev/spark/v4.0.0-preview1-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/articles/sparkr-vignettes_files/ _site/api/R/articles/sparkr-vignettes_
Author: wenchen Date: Fri May 10 16:44:08 2024 New Revision: 69092 Log: Apache Spark v4.0.0-preview1-rc1 docs [This commit notification would consist of 4810 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a6632ffa16f6 [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser a6632ffa16f6 is described below commit a6632ffa16f6907eba96e745920d571924bf4b63 Author: Vladimir Golubev AuthorDate: Sat May 11 00:37:54 2024 +0800 [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser # What changes were proposed in this pull request? New lightweight exception for control-flow between UnivocityParser and FalureSafeParser to speed-up malformed CSV parsing. This is a different way to implement these reverted changes: https://github.com/apache/spark/pull/46478 The previous implementation was more invasive - removing `cause` from `BadRecordException` could break upper code, which unwraps errors and checks the types of the causes. This implementation only touches `FailureSafeParser` and `UnivocityParser` since in the codebase they are always used together, unlike `JacksonParser` and `StaxXmlParser`. Removing stacktrace from `BadRecordException` is safe, since the cause itself has an adequate stacktrace (except pure control-flow cases). ### Why are the changes needed? Parsing in `PermissiveMode` is slow due to heavy exception construction (stacktrace filling + string template substitution in `SparkRuntimeException`) ### Does this PR introduce _any_ user-facing change? No, since `FailureSafeParser` unwraps `BadRecordException` and correctly rethrows user-facing exceptions in `FailFastMode` ### How was this patch tested? - `testOnly org.apache.spark.sql.catalyst.csv.UnivocityParserSuite` - Manually run csv benchmark - Manually checked correct and malformed csv in sherk-shell (org.apache.spark.SparkException is thrown with the stacktrace) ### Was this patch authored or co-authored using generative AI tooling? No Closes #46500 from vladimirg-db/vladimirg-db/use-special-lighweight-exception-for-control-flow-between-univocity-parser-and-failure-safe-parser. Authored-by: Vladimir Golubev Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/csv/UnivocityParser.scala | 5 +++-- .../sql/catalyst/util/BadRecordException.scala | 22 +++--- .../sql/catalyst/util/FailureSafeParser.scala | 11 +-- 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala index a5158d8a22c6..4d95097e1681 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala @@ -316,7 +316,7 @@ class UnivocityParser( throw BadRecordException( () => getCurrentInput, () => Array.empty, -QueryExecutionErrors.malformedCSVRecordError("")) +LazyBadRecordCauseWrapper(() => QueryExecutionErrors.malformedCSVRecordError(""))) } val currentInput = getCurrentInput @@ -326,7 +326,8 @@ class UnivocityParser( // However, we still have chance to parse some of the tokens. It continues to parses the // tokens normally and sets null when `ArrayIndexOutOfBoundsException` occurs for missing // tokens. - Some(QueryExecutionErrors.malformedCSVRecordError(currentInput.toString)) + Some(LazyBadRecordCauseWrapper( +() => QueryExecutionErrors.malformedCSVRecordError(currentInput.toString))) } else None // When the length of the returned tokens is identical to the length of the parsed schema, // we just need to: diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala index 65a56c1064e4..654b0b8c73e5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala @@ -67,16 +67,32 @@ case class PartialResultArrayException( extends Exception(cause) /** - * Exception thrown when the underlying parser meet a bad record and can't parse it. + * Exception thrown when the underlying parser met a bad record and can't parse it. + * The stacktrace is not collected for better preformance, and thus, this exception should + * not be used in a user-facing context. * @param record a function to return the record that cause the parser to fail * @param partialResults a fu
(spark) branch master updated: [SPARK-48146][SQL] Fix aggregate function in With expression child assertion
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7ef0440ef221 [SPARK-48146][SQL] Fix aggregate function in With expression child assertion 7ef0440ef221 is described below commit 7ef0440ef22161a6160f7b9000c70b26c84eecf7 Author: Kelvin Jiang AuthorDate: Fri May 10 22:39:15 2024 +0800 [SPARK-48146][SQL] Fix aggregate function in With expression child assertion ### What changes were proposed in this pull request? In https://github.com/apache/spark/pull/46034, there was a complicated edge case where common expression references in aggregate functions in the child of a `With` expression could become dangling. An assertion was added to avoid that case from happening, but the assertion wasn't fully accurate as a query like: ``` select id between max(if(id between 1 and 2, 2, 1)) over () and id from range(10) ``` would fail the assertion. This PR fixes the assertion to be more accurate. ### Why are the changes needed? This addresses a regression in https://github.com/apache/spark/pull/46034. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46443 from kelvinjian-db/SPARK-48146-agg. Authored-by: Kelvin Jiang Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/expressions/With.scala | 26 + .../optimizer/RewriteWithExpressionSuite.scala | 27 +- 2 files changed, 48 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala index 14deedd9c70f..29794b33641c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala @@ -17,7 +17,8 @@ package org.apache.spark.sql.catalyst.expressions -import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE_EXPRESSION, COMMON_EXPR_REF, TreePattern, WITH_EXPRESSION} +import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression +import org.apache.spark.sql.catalyst.trees.TreePattern.{COMMON_EXPR_REF, TreePattern, WITH_EXPRESSION} import org.apache.spark.sql.types.DataType /** @@ -27,9 +28,11 @@ import org.apache.spark.sql.types.DataType */ case class With(child: Expression, defs: Seq[CommonExpressionDef]) extends Expression with Unevaluable { - // We do not allow With to be created with an AggregateExpression in the child, as this would - // create a dangling CommonExpressionRef after rewriting it in RewriteWithExpression. - assert(!child.containsPattern(AGGREGATE_EXPRESSION)) + // We do not allow creating a With expression with an AggregateExpression that contains a + // reference to a common expression defined in that scope (note that it can contain another With + // expression with a common expression ref of the inner With). This is to prevent the creation of + // a dangling CommonExpressionRef after rewriting it in RewriteWithExpression. + assert(!With.childContainsUnsupportedAggExpr(this)) override val nodePatterns: Seq[TreePattern] = Seq(WITH_EXPRESSION) override def dataType: DataType = child.dataType @@ -92,6 +95,21 @@ object With { val commonExprRefs = commonExprDefs.map(new CommonExpressionRef(_)) With(replaced(commonExprRefs), commonExprDefs) } + + private[sql] def childContainsUnsupportedAggExpr(withExpr: With): Boolean = { +lazy val commonExprIds = withExpr.defs.map(_.id).toSet +withExpr.child.exists { + case agg: AggregateExpression => +// Check that the aggregate expression does not contain a reference to a common expression +// in the outer With expression (it is ok if it contains a reference to a common expression +// for a nested With expression). +agg.exists { + case r: CommonExpressionRef => commonExprIds.contains(r.id) + case _ => false +} + case _ => false +} + } } case class CommonExpressionId(id: Long = CommonExpressionId.newId, canonicalized: Boolean = false) { diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala index d482b18d9331..8f023fa4156b 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpressionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/
(spark) branch master updated (33cac4436e59 -> 2df494fd4e4e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 33cac4436e59 [SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion` add 2df494fd4e4e [SPARK-48158][SQL] Add collation support for XML expressions No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/xmlExpressions.scala | 9 +- .../spark/sql/CollationSQLExpressionsSuite.scala | 124 + 2 files changed, 129 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9a2818820f11 [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file 9a2818820f11 is described below commit 9a2818820f11f9bdcc042f4ab80850918911c68c Author: Nicholas Chammas AuthorDate: Fri May 10 09:58:16 2024 +0800 [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file ### What changes were proposed in this pull request? Sync the version of Bundler that we are using across various scripts and documentation. Also refresh the Gem lock file. ### Why are the changes needed? We are seeing inconsistent build behavior, likely due to the inconsistent Bundler versions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI + the preview release process. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46512 from nchammas/bundler-sync. Authored-by: Nicholas Chammas Signed-off-by: Wenchen Fan --- .github/workflows/build_and_test.yml | 3 +++ dev/create-release/spark-rm/Dockerfile | 2 +- docs/Gemfile.lock | 16 docs/README.md | 2 +- 4 files changed, 13 insertions(+), 10 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 4a11823aee60..881fb8cb0674 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -872,6 +872,9 @@ jobs: python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421 - name: Install dependencies for documentation generation run: | +# Keep the version of Bundler here in sync with the following locations: +# - dev/create-release/spark-rm/Dockerfile +# - docs/README.md gem install bundler -v 2.4.22 cd docs bundle install diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile index 8d5ca38ba88e..13f4112ca03d 100644 --- a/dev/create-release/spark-rm/Dockerfile +++ b/dev/create-release/spark-rm/Dockerfile @@ -38,7 +38,7 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true ARG APT_INSTALL="apt-get install --no-install-recommends -y" ARG PIP_PKGS="sphinx==4.5.0 mkdocs==1.1.2 numpy==1.20.3 pydata_sphinx_theme==0.13.3 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 jinja2==3.1.2 twine==3.4.1 sphinx-plotly-directive==0.1.3 sphinx-copybutton==0.5.2 pandas==2.0.3 pyarrow==10.0.1 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.62.0 protobuf==4.21.6 grpcio-status==1.62.0 googleapis-common-protos==1.56.4" -ARG GEM_PKGS="bundler:2.3.8" +ARG GEM_PKGS="bundler:2.4.22" # Install extra needed repos and refresh. # - CRAN repo diff --git a/docs/Gemfile.lock b/docs/Gemfile.lock index 4e38f18703f3..e137f0f039b9 100644 --- a/docs/Gemfile.lock +++ b/docs/Gemfile.lock @@ -4,16 +4,16 @@ GEM addressable (2.8.6) public_suffix (>= 2.0.2, < 6.0) colorator (1.1.0) -concurrent-ruby (1.2.2) +concurrent-ruby (1.2.3) em-websocket (0.5.3) eventmachine (>= 0.12.9) http_parser.rb (~> 0) eventmachine (1.2.7) ffi (1.16.3) forwardable-extended (2.6.0) -google-protobuf (3.25.2) +google-protobuf (3.25.3) http_parser.rb (0.8.0) -i18n (1.14.1) +i18n (1.14.5) concurrent-ruby (~> 1.0) jekyll (4.3.3) addressable (~> 2.4) @@ -42,22 +42,22 @@ GEM kramdown-parser-gfm (1.1.0) kramdown (~> 2.0) liquid (4.0.4) -listen (3.8.0) +listen (3.9.0) rb-fsevent (~> 0.10, >= 0.10.3) rb-inotify (~> 0.9, >= 0.9.10) mercenary (0.4.0) pathutil (0.16.2) forwardable-extended (~> 2.6) -public_suffix (5.0.4) -rake (13.1.0) +public_suffix (5.0.5) +rake (13.2.1) rb-fsevent (0.11.2) rb-inotify (0.10.1) ffi (~> 1.0) rexml (3.2.6) rouge (3.30.0) safe_yaml (1.0.5) -sass-embedded (1.69.7) - google-protobuf (~> 3.25) +sass-embedded (1.63.6) + google-protobuf (~> 3.23) rake (>= 13.0.0) terminal-table (3.0.2) unicode-display_width (>= 1.1.1, < 3) diff --git a/docs/README.md b/docs/README.md index 414c8dbd8303..363f1c207636 100644 --- a/docs/README.md +++ b/docs/README.md @@ -36,7 +36,7 @@ You need to have [Ruby 3][ruby] and [Python 3][python] installed. Make sure the [python]: https://www.python.org/downloads/ ```sh -$ gem install bundler +$ gem install bundler -v 2.4.22 ``` After this all the required Ruby dependencies can be installed from the `docs/` directory
svn commit: r69065 - /dev/spark/v4.0.0-preview1-rc1-bin/
Author: wenchen Date: Thu May 9 16:31:11 2024 New Revision: 69065 Log: Apache Spark v4.0.0-preview1-rc1 Added: dev/spark/v4.0.0-preview1-rc1-bin/ dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.asc dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc1-bin/pyspark-connect-4.0.0.dev1.tar.gz (with props) dev/spark/v4.0.0-preview1-rc1-bin/pyspark-connect-4.0.0.dev1.tar.gz.asc dev/spark/v4.0.0-preview1-rc1-bin/pyspark-connect-4.0.0.dev1.tar.gz.sha512 dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-hadoop3.tgz (with props) dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.asc dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-hadoop3.tgz.sha512 dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz (with props) dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.asc dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1-bin-without-hadoop.tgz.sha512 dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1.tgz (with props) dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1.tgz.asc dev/spark/v4.0.0-preview1-rc1-bin/spark-4.0.0-preview1.tgz.sha512 Added: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.asc Thu May 9 16:31:11 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmY8+e4THHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/Wv78D/9aNsBANuVpIjYr+XkWYaimRLJ5IT0Z +qKehjJBuMBDaBMMN3iWconDHBiASQT0FTYGDBeYI72fLFSMKBna5+Lu22+KD/K6h +V8SZxPSQsAHQABYq9ha++XXyo1Vo+msPQ0pQAblmTrSpsvSWZmC8spzb5GbKYvK5 +kxr4Qt1XnHeGNJNToqGlbl/Hc2Etg5PkPBxMPBWMh7kLknMEscMNUf87JqCIa8LG +hMid/0lrrevEm8gkuu0ol9Vgz4P+dreKE9eCfmWOXCod04y8tJnVPs83wUOZfmKV +dHkELaMVwz3fa40QP77gK38K5i22aUgYk6dvhB+OgtatZ5tk0Dxp3AI2OObngEUm +4cGmQLwcses53vApwkExq427gS8td4sTE2G1D4+hSdEcm8Fj69w4Ado/DlIAHZob +KLV15qtNOyaIapT4GxBqoeqsw7tnRmxiP8K8UxFcPV/vZC1yQKIIULigPjttZKoW ++REE2N7ZyPvbvgItwjAL8hpCeYEkd7RDa7ofHAv6icC1qSsJZ9gxFM4rJvriI4g2 +tnYEvZduGpBunhlwVb0R3kAF5XoLIZQ5qm6kyWAzioc0gxzYVc3Rd+bXjm+vmopt +bXHOM6N2lLQwqnWlHsyjGVFugrkkRXZbQbIV6FynXpKaz5YtkUhUMkofz7mOYhBi ++1Z8nZ04B6YLbw== +=85FX +-END PGP SIGNATURE- Added: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.sha512 == --- dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.sha512 (added) +++ dev/spark/v4.0.0-preview1-rc1-bin/pyspark-4.0.0.dev1.tar.gz.sha512 Thu May 9 16:31:11 2024 @@ -0,0 +1 @@ +2509cf6473495b0cd5c132d87f5e1c33593fa7375ca01bcab1483093cea92bdb6ad7afc7c72095376b28fc5acdc71bb323935d17513f33ee5276c6991ff668d1 pyspark-4.0.0.dev1.tar.gz Added: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-connect-4.0.0.dev1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-connect-4.0.0.dev1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview1-rc1-bin/pyspark-connect-4.0.0.dev1.tar.gz.asc == --- dev/spark/v4.0.0-preview1-rc1-bin/pyspark-connect-4.0.0.dev1.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview1-rc1-bin/pyspark-connect-4.0.0.dev1.tar.gz.asc Thu May 9 16:31:11 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJGBAABCgAxFiEETclnbO+ag+mPygJ4TWYghDzYf1oFAmY8+fATHHdlbmNoZW5A +YXBhY2hlLm9yZwAKCRBNZiCEPNh/WoCMD/iZjkaGTUqt3jkIjWIUzpQo+kLn8//m +f+hwUtAguXvbMJXwBOz/Q/f+KvGk0tutsbd6rmBB6cHjH4GoZPp1x6iBitFAO47r +kHy/0xYkb70SPQCWIGQQpRv3g0uxTmpqL9H4YcIvexkV2wXG5VSwGvbSI4596n7l +x7M3rRmFzrxhcNIYLQdhNuat0mwuJFWe6R7Zk7UYFFishn9dNt8EOYx8vsGAuMP8 +Uy3+7oZQOAGqdQGSL7Ev4Pqve7MrrPgGXaixGukXibi707NCURnHTDcenPfoEEiQ +Hj83I3G+JrRhtsue/103a/GnHheUgwE8oEkefnUX7qC5tSn4T8lI2KpDBv9AL1pm +Bv0eXf5X5xEM4wvO7DCgbeEDPLg72jjt9X8zjAYx05HddvTuPjeKEL+Ga6G0ueTz +HRXHrgd1EFZ1znPZhWiSTmeqZTXdrb6wKTYt8Y6mk1oEGL3b0qE2LNkSED+4l40u +41MlV3pmZyjRGYZl29XZKf4isKYyjec7UbJSM5ok4zCRF0p8Gvj0EihGS4X6rYpW +9XxwjViKMIp7DCEcWjWpO6pJ8Ygb2Snh1UTFFgtzSVAoMqUgHnBHejJ4RA4ncHu6
(spark) branch master updated: [SPARK-47409][SQL] Add support for collation for StringTrim type of functions/expressions (for UTF8_BINARY & LCASE)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 21333f8c1fc0 [SPARK-47409][SQL] Add support for collation for StringTrim type of functions/expressions (for UTF8_BINARY & LCASE) 21333f8c1fc0 is described below commit 21333f8c1fc01756e6708ad6ccf21f585fcb881d Author: David Milicevic AuthorDate: Thu May 9 23:05:20 2024 +0800 [SPARK-47409][SQL] Add support for collation for StringTrim type of functions/expressions (for UTF8_BINARY & LCASE) Recreating [original PR](https://github.com/apache/spark/pull/45749) because code has been reorganized in [this PR](https://github.com/apache/spark/pull/45978). ### What changes were proposed in this pull request? This PR is created to add support for collations to StringTrim family of functions/expressions, specifically: - `StringTrim` - `StringTrimBoth` - `StringTrimLeft` - `StringTrimRight` Changes: - `CollationSupport.java` - Add new `StringTrim`, `StringTrimLeft` and `StringTrimRight` classes with corresponding logic. - `CollationAwareUTF8String` - add new `trim`, `trimLeft` and `trimRight` methods that actually implement trim logic. - `UTF8String.java` - expose some of the methods publicly. - `stringExpressions.scala` - Change input types. - Change eval and code gen logic. - `CollationTypeCasts.scala` - add `StringTrim*` expressions to `CollationTypeCasts` rules. ### Why are the changes needed? We are incrementally adding collation support to a built-in string functions in Spark. ### Does this PR introduce _any_ user-facing change? Yes: - User should now be able to use non-default collations in string trim functions. ### How was this patch tested? Already existing tests + new unit/e2e tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46206 from davidm-db/string-trim-functions. Authored-by: David Milicevic Signed-off-by: Wenchen Fan --- .../catalyst/util/CollationAwareUTF8String.java| 470 ++ .../spark/sql/catalyst/util/CollationSupport.java | 534 - .../org/apache/spark/unsafe/types/UTF8String.java | 2 +- .../spark/unsafe/types/CollationSupportSuite.java | 193 .../sql/catalyst/analysis/CollationTypeCasts.scala | 2 +- .../catalyst/expressions/stringExpressions.scala | 53 +- .../sql/CollationStringExpressionsSuite.scala | 161 ++- 7 files changed, 1054 insertions(+), 361 deletions(-) diff --git a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java new file mode 100644 index ..ee0d611d7e65 --- /dev/null +++ b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java @@ -0,0 +1,470 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.catalyst.util; + +import com.ibm.icu.lang.UCharacter; +import com.ibm.icu.text.BreakIterator; +import com.ibm.icu.text.StringSearch; +import com.ibm.icu.util.ULocale; + +import org.apache.spark.unsafe.UTF8StringBuilder; +import org.apache.spark.unsafe.types.UTF8String; + +import static org.apache.spark.unsafe.Platform.BYTE_ARRAY_OFFSET; +import static org.apache.spark.unsafe.Platform.copyMemory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Utility class for collation-aware UTF8String operations. + */ +public class CollationAwareUTF8String { + public static UTF8String replace(final UTF8String src, final UTF8String search, + final UTF8String replace, final int collationId) { +// This collation aware implementation is based on existing implementation on UTF8String +if (src.numBytes() == 0 || search.numBytes() == 0) { + return src; +} + +StringSearch stringSearch = CollationFactory.getStringSearch(src, search,