(spark) branch branch-3.4 updated: [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 3c41b1d97e1f [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208 3c41b1d97e1f is described below commit 3c41b1d97e1f5ff9f74f9ea72f7ea92dcbca2122 Author: Dongjoon Hyun AuthorDate: Fri Mar 15 22:42:17 2024 -0700 [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208 ### What changes were proposed in this pull request? This PR aims to upgrade Jetty to 9.4.54.v20240208 for Apache Spark 3.4.3. ### Why are the changes needed? To bring the latest bug fixes. - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.52.v20230823 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.51.v20230217 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45544 from dongjoon-hyun/SPARK-47428-3.4. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index 691c83632b38..a94fbcd0ca77 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -143,7 +143,7 @@ jersey-hk2/2.36//jersey-hk2-2.36.jar jersey-server/2.36//jersey-server-2.36.jar jetty-sslengine/6.1.26//jetty-sslengine-6.1.26.jar jetty-util/6.1.26//jetty-util-6.1.26.jar -jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar +jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar jetty/6.1.26//jetty-6.1.26.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.2//joda-time-2.12.2.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 4d94cb5c699e..99665da7d16a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -128,8 +128,8 @@ jersey-container-servlet/2.36//jersey-container-servlet-2.36.jar jersey-hk2/2.36//jersey-hk2-2.36.jar jersey-server/2.36//jersey-server-2.36.jar jettison/1.1//jettison-1.1.jar -jetty-util-ajax/9.4.50.v20221201//jetty-util-ajax-9.4.50.v20221201.jar -jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar +jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar +jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.2//joda-time-2.12.2.jar jodd-core/3.5.2//jodd-core-3.5.2.jar diff --git a/pom.xml b/pom.xml index 373d17b76c09..77218d162c41 100644 --- a/pom.xml +++ b/pom.xml @@ -143,7 +143,7 @@ 1.12.3 1.8.6 shaded-protobuf -9.4.50.v20221201 +9.4.54.v20240208 4.0.3 0.10.0 2.5.1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 210e80e8b7ba [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job 210e80e8b7ba is described below commit 210e80e8b7baa5fc1e6462615bc8134a4c90647c Author: Dongjoon Hyun AuthorDate: Tue Oct 17 23:38:56 2023 -0700 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job ### What changes were proposed in this pull request? This PR aims to skip `Unidoc` and `MIMA` phases in many general test pipelines. `mima` test is moved to `lint` job. ### Why are the changes needed? By having an independent document generation and mima checking GitHub Action job, we can skip them in the following many jobs. https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check the GitHub action logs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43422 from dongjoon-hyun/SPARK-45587. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794) Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 1 file changed, 4 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 13527119e51a..33747fb5b61d 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -198,6 +198,8 @@ jobs: HIVE_PROFILE: ${{ matrix.hive }} GITHUB_PREV_SHA: ${{ github.event.before }} SPARK_LOCAL_IP: localhost + SKIP_UNIDOC: true + SKIP_MIMA: true SKIP_PACKAGING: true steps: - name: Checkout Spark repository @@ -578,6 +580,8 @@ jobs: run: ./dev/check-license - name: Dependencies test run: ./dev/test-dependencies.sh +- name: MIMA test + run: ./dev/mima - name: Scala linter run: ./dev/lint-scala - name: Java linter - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 8c6eeb8ab018 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job 8c6eeb8ab018 is described below commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794 Author: Dongjoon Hyun AuthorDate: Tue Oct 17 23:38:56 2023 -0700 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job ### What changes were proposed in this pull request? This PR aims to skip `Unidoc` and `MIMA` phases in many general test pipelines. `mima` test is moved to `lint` job. ### Why are the changes needed? By having an independent document generation and mima checking GitHub Action job, we can skip them in the following many jobs. https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check the GitHub action logs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43422 from dongjoon-hyun/SPARK-45587. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 1 file changed, 4 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index ad8685754b31..b0760a955342 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -204,6 +204,8 @@ jobs: HIVE_PROFILE: ${{ matrix.hive }} GITHUB_PREV_SHA: ${{ github.event.before }} SPARK_LOCAL_IP: localhost + SKIP_UNIDOC: true + SKIP_MIMA: true SKIP_PACKAGING: true steps: - name: Checkout Spark repository @@ -627,6 +629,8 @@ jobs: run: ./dev/check-license - name: Dependencies test run: ./dev/test-dependencies.sh +- name: MIMA test + run: ./dev/mima - name: Scala linter run: ./dev/lint-scala - name: Java linter - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new d59425275cdd [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208 d59425275cdd is described below commit d59425275cdd0ff678a5bcccef4c7b74fe8170cb Author: Dongjoon Hyun AuthorDate: Fri Mar 15 22:28:45 2024 -0700 [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208 ### What changes were proposed in this pull request? This PR aims to upgrade Jetty to 9.4.54.v20240208 ### Why are the changes needed? To bring the latest bug fixes. - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45543 from dongjoon-hyun/SPARK-47428. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index c76702cd0af0..8ecf931bf513 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -130,8 +130,8 @@ jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar jersey-hk2/2.40//jersey-hk2-2.40.jar jersey-server/2.40//jersey-server-2.40.jar jettison/1.1//jettison-1.1.jar -jetty-util-ajax/9.4.52.v20230823//jetty-util-ajax-9.4.52.v20230823.jar -jetty-util/9.4.52.v20230823//jetty-util-9.4.52.v20230823.jar +jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar +jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.5//joda-time-2.12.5.jar jodd-core/3.5.2//jodd-core-3.5.2.jar diff --git a/pom.xml b/pom.xml index 5db3c78e00eb..fb6208777d3f 100644 --- a/pom.xml +++ b/pom.xml @@ -143,7 +143,7 @@ 1.13.1 1.9.2 shaded-protobuf -9.4.52.v20230823 +9.4.54.v20240208 4.0.3 0.10.0
(spark) branch master updated: [SPARK-47327][SQL] Move sort keys concurrency test to CollationFactorySuite
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6719168b6ec7 [SPARK-47327][SQL] Move sort keys concurrency test to CollationFactorySuite 6719168b6ec7 is described below commit 6719168b6ec72242e111bcb3aae75985d36fdad2 Author: Stefan Kandic AuthorDate: Sat Mar 16 09:24:22 2024 +0500 [SPARK-47327][SQL] Move sort keys concurrency test to CollationFactorySuite ### What changes were proposed in this pull request? Move concurrency test to the `CollationFactorySuite` ### Why are the changes needed? This is more appropriate location for the test as it directly uses the `CollationFactory`. Also, I just found out that `par` method is highly discouraged and that we should use `ParSeq` instead. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? With existing UTs ### Was this patch authored or co-authored using generative AI tooling? No Closes #45501 from stefankandic/moveTest. Authored-by: Stefan Kandic Signed-off-by: Max Gekk --- common/unsafe/pom.xml | 6 ++ .../apache/spark/unsafe/types/CollationFactorySuite.scala | 14 ++ .../test/scala/org/apache/spark/sql/CollationSuite.scala | 14 -- 3 files changed, 20 insertions(+), 14 deletions(-) diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml index e9785ebb7ad4..13b45f55a4ad 100644 --- a/common/unsafe/pom.xml +++ b/common/unsafe/pom.xml @@ -47,6 +47,12 @@ ${project.version} + + org.scala-lang.modules + scala-parallel-collections_${scala.binary.version} + test + + com.ibm.icu icu4j diff --git a/common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala b/common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala index f9927b94fd42..0a9ff7558e3a 100644 --- a/common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala +++ b/common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala @@ -17,6 +17,7 @@ package org.apache.spark.unsafe.types +import scala.collection.parallel.immutable.ParSeq import scala.jdk.CollectionConverters.MapHasAsScala import org.apache.spark.SparkException @@ -138,4 +139,17 @@ class CollationFactorySuite extends AnyFunSuite with Matchers { // scalastyle:ig assert(result == testCase.expectedResult) }) } + + test("test concurrently generating collation keys") { +// generating ICU sort keys is not thread-safe by default so this should fail +// if we don't handle the concurrency properly on Collator level + +(0 to 10).foreach(_ => { + val collator = fetchCollation("UNICODE").collator + + ParSeq(0 to 100).foreach { _ => +collator.getCollationKey("aaa") + } +}) + } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala index bef7417be36c..aaf3e88c9bdb 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala @@ -18,7 +18,6 @@ package org.apache.spark.sql import scala.collection.immutable.Seq -import scala.collection.parallel.CollectionConverters.ImmutableIterableIsParallelizable import scala.jdk.CollectionConverters.MapHasAsJava import org.apache.spark.SparkException @@ -413,19 +412,6 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanHelper { } } - test("test concurrently generating collation keys") { -// generating ICU sort keys is not thread-safe by default so this should fail -// if we don't handle the concurrency properly on Collator level - -(0 to 10).foreach(_ => { - val collator = CollationFactory.fetchCollation("UNICODE").collator - - (0 to 100).par.foreach { _ => -collator.getCollationKey("aaa") - } -}) - } - test("text writing to parquet with collation enclosed with backticks") { withTempPath{ path => sql(s"select 'a' COLLATE `UNICODE`").write.parquet(path.getAbsolutePath) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47423][SQL] Collations - Set operation support for strings with collations
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 653ac5b729e2 [SPARK-47423][SQL] Collations - Set operation support for strings with collations 653ac5b729e2 is described below commit 653ac5b729e2eba9bf097905b3fd136603b7a298 Author: Aleksandar Tomic AuthorDate: Sat Mar 16 09:21:08 2024 +0500 [SPARK-47423][SQL] Collations - Set operation support for strings with collations ### What changes were proposed in this pull request? This PR fixes support for set operations for strings with collations different from `UTF8_BINARY`. The fix is not strictly related to set operations and may resolve other problems in collation space. The fix is to add default value for `StringType` with collation. Previously the matching pattern would not catch the `StringType` with collation case and fix is simply to do pattern matching on `st: StringType` instead of relying on `StringType` match. ### Why are the changes needed? Fixing behaviour of set operations. ### Does this PR introduce _any_ user-facing change? Yes - fixing the logic that previously didn't work. ### How was this patch tested? Golden file tests are added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45536 from dbatomic/collations_and_set_ops. Authored-by: Aleksandar Tomic Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/literals.scala | 2 +- .../sql-tests/analyzer-results/collations.sql.out | 51 + .../test/resources/sql-tests/inputs/collations.sql | 7 +++ .../resources/sql-tests/results/collations.sql.out | 53 ++ 4 files changed, 112 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala index 9603647db06f..eadd4c04f4b3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala @@ -195,7 +195,7 @@ object Literal { case TimestampNTZType => create(0L, TimestampNTZType) case it: DayTimeIntervalType => create(0L, it) case it: YearMonthIntervalType => create(0, it) -case StringType => Literal("") +case st: StringType => Literal(UTF8String.fromString(""), st) case BinaryType => Literal("".getBytes(StandardCharsets.UTF_8)) case CalendarIntervalType => Literal(new CalendarInterval(0, 0, 0)) case arr: ArrayType => create(Array(), arr) diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out index fff2d4eab717..6d9bb3470be6 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out @@ -149,6 +149,57 @@ DropTable false, false +- ResolvedIdentifier V2SessionCatalog(spark_catalog), default.t1 +-- !query +select col1 collate utf8_binary_lcase from values ('aaa'), ('AAA'), ('bbb'), ('BBB'), ('zzz'), ('ZZZ') except select col1 collate utf8_binary_lcase from values ('aaa'), ('bbb') +-- !query analysis +Except false +:- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x] +: +- LocalRelation [col1#x] ++- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x] + +- LocalRelation [col1#x] + + +-- !query +select col1 collate utf8_binary_lcase from values ('aaa'), ('AAA'), ('bbb'), ('BBB'), ('zzz'), ('ZZZ') except all select col1 collate utf8_binary_lcase from values ('aaa'), ('bbb') +-- !query analysis +Except All true +:- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x] +: +- LocalRelation [col1#x] ++- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x] + +- LocalRelation [col1#x] + + +-- !query +select col1 collate utf8_binary_lcase from values ('aaa'), ('AAA'), ('bbb'), ('BBB'), ('zzz'), ('ZZZ') union select col1 collate utf8_binary_lcase from values ('aaa'), ('bbb') +-- !query analysis +Distinct ++- Union false, false + :- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x] + : +- LocalRelation [col1#x] + +- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x] + +- LocalRelation [col1#x] + + +-- !query +select col1 collate utf8_binary_lcase from values ('aaa'), ('AAA'), ('bbb'), ('BBB'), ('zzz'), ('ZZZ') union all select col1 collate utf8_binary_lcase from values ('aaa'), ('bbb') +-- !query analysis +Union false, false +:- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x] +: +- LocalRelation [col1#x] ++- Pro
(spark) branch master updated (7c81bdf1ed17 -> add49b3c115f)
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7c81bdf1ed17 [SPARK-47345][SQL][TESTS] Xml functions suite add add49b3c115f [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/SparkEnv.scala | 27 ++ .../spark/api/python/PythonWorkerFactory.scala | 12 +- .../spark/api/python/StreamingPythonRunner.scala | 18 +-- .../api/python/PythonWorkerFactorySuite.scala | 2 +- .../sql/execution/python/PythonPlannerRunner.scala | 4 ++-- .../python/PythonStreamingSourceRunner.scala | 2 +- 6 files changed, 39 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47345][SQL][TESTS] Xml functions suite
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7c81bdf1ed17 [SPARK-47345][SQL][TESTS] Xml functions suite 7c81bdf1ed17 is described below commit 7c81bdf1ed17df31ec6d7a3ee9f18b73d8ae2bd6 Author: Yousof Hosny AuthorDate: Fri Mar 15 22:56:29 2024 +0500 [SPARK-47345][SQL][TESTS] Xml functions suite ### What changes were proposed in this pull request? Convert JsonFunctiosnSuite.scala to XML equivalent. Note that XML doesn’t implement all json functions like json_tuple, get_json_object, etc. ### Why are the changes needed? Improve unit test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45466 from yhosny/xml-functions-suite. Authored-by: Yousof Hosny Signed-off-by: Max Gekk --- .../org/apache/spark/sql/XmlFunctionsSuite.scala | 480 + 1 file changed, 480 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala new file mode 100644 index ..fcfbebaa61ec --- /dev/null +++ b/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala @@ -0,0 +1,480 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.text.SimpleDateFormat +import java.util.Locale + +import scala.jdk.CollectionConverters._ + +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types._ + +class XmlFunctionsSuite extends QueryTest with SharedSparkSession { + import testImplicits._ + + test("from_xml") { +val df = Seq("""1""").toDS() +val schema = new StructType().add("a", IntegerType) + +checkAnswer( + df.select(from_xml($"value", schema)), + Row(Row(1)) :: Nil) + } + + test("from_xml with option (timestampFormat)") { +val df = Seq("""26/08/2015 18:00""").toDS() +val schema = new StructType().add("time", TimestampType) +val options = Map("timestampFormat" -> "dd/MM/ HH:mm").asJava + +checkAnswer( + df.select(from_xml($"value", schema, options)), + Row(Row(java.sql.Timestamp.valueOf("2015-08-26 18:00:00.0" + } + + test("from_xml with option (rowTag)") { +val df = Seq("""1""").toDS() +val schema = new StructType().add("a", IntegerType) +val options = Map("rowTag" -> "foo").asJava + +checkAnswer( + df.select(from_xml($"value", schema)), + Row(Row(1)) :: Nil) + } + + test("from_xml with option (dateFormat)") { +val df = Seq("""26/08/2015""").toDS() +val schema = new StructType().add("time", DateType) +val options = Map("dateFormat" -> "dd/MM/").asJava + +checkAnswer( + df.select(from_xml($"value", schema, options)), + Row(Row(java.sql.Date.valueOf("2015-08-26" + } + + test("from_xml missing columns") { +val df = Seq("""1""").toDS() +val schema = new StructType().add("b", IntegerType) + +checkAnswer( + df.select(from_xml($"value", schema)), + Row(Row(null)) :: Nil) + } + + test("from_xml invalid xml") { +val df = Seq("""1""").toDS() +val schema = new StructType().add("a", IntegerType) + +checkAnswer( + df.select(from_xml($"value", schema)), + Row(Row(null)) :: Nil) + } + + test("from_xml - xml doesn't conform to the array type") { +val df = Seq("""1""").toDS() +val schema = StructType(StructField("a", ArrayType(IntegerType)) :: Nil) + +checkAnswer(df.select(from_xml($"value", schema)), Row(Row(null))) + } + + test("from_xml array support") { +val df = Seq(s""" 1 2 """.stripMargin).toDS() +val schema = StructType(StructField("a", ArrayType(IntegerType)) :: Nil) + +checkAnswer( + df.select(from_xml($"value", schema)), + Row
(spark) branch master updated (6bf031796c8c -> e2c0471476ea)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6bf031796c8c [SPARK-44740][CONNECT][TESTS][FOLLOWUP] Deduplicate `test_metadata` add e2c0471476ea [SPARK-47395] Add collate and collation to other APIs No new revisions were added by this update. Summary of changes: R/pkg/NAMESPACE| 2 + R/pkg/R/functions.R| 29 R/pkg/R/generics.R | 8 R/pkg/tests/fulltests/test_sparkSQL.R | 2 + .../scala/org/apache/spark/sql/functions.scala | 16 +++ .../apache/spark/sql/PlanGenerationTestSuite.scala | 8 ...ulls_first.explain => function_collate.explain} | 2 +- ...ls_first.explain => function_collation.explain} | 2 +- .../{column_rlike.json => function_collate.json} | 4 +- ...sition.proto.bin => function_collate.proto.bin} | Bin 189 -> 189 bytes ...{column_isNull.json => function_collation.json} | 2 +- ..._geq.proto.bin => function_collation.proto.bin} | Bin 178 -> 178 bytes .../source/reference/pyspark.sql/functions.rst | 2 + python/pyspark/sql/connect/functions/builtin.py| 14 ++ python/pyspark/sql/functions/builtin.py| 52 + python/pyspark/sql/tests/test_functions.py | 5 ++ .../scala/org/apache/spark/sql/functions.scala | 16 +++ 17 files changed, 159 insertions(+), 5 deletions(-) copy connector/connect/common/src/test/resources/query-tests/explain-results/{column_asc_nulls_first.explain => function_collate.explain} (57%) copy connector/connect/common/src/test/resources/query-tests/explain-results/{column_asc_nulls_first.explain => function_collation.explain} (61%) copy connector/connect/common/src/test/resources/query-tests/queries/{column_rlike.json => function_collate.json} (89%) copy connector/connect/common/src/test/resources/query-tests/queries/{function_bitmap_bit_position.proto.bin => function_collate.proto.bin} (85%) copy connector/connect/common/src/test/resources/query-tests/queries/{column_isNull.json => function_collation.json} (93%) copy connector/connect/common/src/test/resources/query-tests/queries/{column_geq.proto.bin => function_collation.proto.bin} (90%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (4437e6e21237 -> 6bf031796c8c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` to `common/utils` add 6bf031796c8c [SPARK-44740][CONNECT][TESTS][FOLLOWUP] Deduplicate `test_metadata` No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/connect/test_connect_session.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (b7aa9740249b -> 4437e6e21237)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b7aa9740249b [SPARK-47407][SQL] Support java.sql.Types.NULL map to NullType add 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` to `common/utils` No new revisions were added by this update. Summary of changes: .../utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename {core => common/utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties (100%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (4b1f8c3d779b -> b7aa9740249b)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4b1f8c3d779b [SPARK-47399][SQL] Disable generated columns on expressions with collations add b7aa9740249b [SPARK-47407][SQL] Support java.sql.Types.NULL map to NullType No new revisions were added by this update. Summary of changes: .../apache/spark/sql/jdbc/PostgresIntegrationSuite.scala | 16 +++- .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala | 6 +- .../org/apache/spark/sql/jdbc/PostgresDialect.scala | 1 + 3 files changed, 21 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47399][SQL] Disable generated columns on expressions with collations
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4b1f8c3d779b [SPARK-47399][SQL] Disable generated columns on expressions with collations 4b1f8c3d779b is described below commit 4b1f8c3d779b1391b414d6d6791bed5800b600bd Author: Stefan Kandic AuthorDate: Fri Mar 15 16:12:40 2024 +0500 [SPARK-47399][SQL] Disable generated columns on expressions with collations ### What changes were proposed in this pull request? Disable the ability to use collations in expressions for generated columns. ### Why are the changes needed? Changing the collation of a column or even just changing the ICU version could lead to a differences in the resulting expression so it would be best if we simply disable it for now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? With new unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45520 from stefankandic/disableGeneratedColumnsCollation. Authored-by: Stefan Kandic Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/GeneratedColumn.scala | 5 ++ .../org/apache/spark/sql/CollationSuite.scala | 53 ++ 2 files changed, 58 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala index 28ddc16cf6b0..747a0e225a2f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala @@ -29,6 +29,7 @@ import org.apache.spark.sql.connector.catalog.{CatalogManager, Identifier, Table import org.apache.spark.sql.errors.QueryCompilationErrors import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.{DataType, StructField, StructType} +import org.apache.spark.sql.util.SchemaUtils /** * This object contains utility methods and values for Generated Columns @@ -162,6 +163,10 @@ object GeneratedColumn { s"generation expression data type ${analyzed.dataType.simpleString} " + s"is incompatible with column data type ${dataType.simpleString}") } +if (analyzed.exists(e => SchemaUtils.hasNonDefaultCollatedString(e.dataType))) { + throw unsupportedExpressionError( +"generation expression cannot contain non-default collated string type") +} } /** diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala index 72e72a53c4f6..bef7417be36c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala @@ -622,4 +622,57 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanHelper { case _: SortMergeJoinExec => () }.nonEmpty) } + + test("Generated column expressions using collations - errors out") { +checkError( + exception = intercept[AnalysisException] { +sql( + s""" + |CREATE TABLE testcat.test_table( + | c1 STRING COLLATE UNICODE, + | c2 STRING COLLATE UNICODE GENERATED ALWAYS AS (SUBSTRING(c1, 0, 1)) + |) + |USING $v2Source + |""".stripMargin) + }, + errorClass = "UNSUPPORTED_EXPRESSION_GENERATED_COLUMN", + parameters = Map( +"fieldName" -> "c2", +"expressionStr" -> "SUBSTRING(c1, 0, 1)", +"reason" -> "generation expression cannot contain non-default collated string type")) + +checkError( + exception = intercept[AnalysisException] { +sql( + s""" + |CREATE TABLE testcat.test_table( + | c1 STRING COLLATE UNICODE, + | c2 STRING COLLATE UNICODE GENERATED ALWAYS AS (c1 || 'a' COLLATE UNICODE) + |) + |USING $v2Source + |""".stripMargin) + }, + errorClass = "UNSUPPORTED_EXPRESSION_GENERATED_COLUMN", + parameters = Map( +"fieldName" -> "c2", +"expressionStr" -> "c1 || 'a' COLLATE UNICODE", +"reason" -> "generation expression cannot contain non-default collated string type")) + +checkError( + exception = intercept[AnalysisException] { +sql( + s""" + |CREATE TABLE testcat.test_table( + | struct1 STRUCT, + | c2 STRING COLLATE UNICODE GENERATED ALWAYS AS (SUBSTRING(struct1.a, 0, 1)) + |) + |USING $v2Source + |""".stripMargin) + }, + errorClass = "UNSUPPORTED_EXP
(spark) branch master updated: [SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in MYSQLDialect
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 19ac1fc13646 [SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in MYSQLDialect 19ac1fc13646 is described below commit 19ac1fc13646e982ef76718b5e7a0f0e5147794e Author: Kent Yao AuthorDate: Fri Mar 15 16:38:08 2024 +0800 [SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in MYSQLDialect ### What changes were proposed in this pull request? In MySQL, TIMESTAMP and DATETIME are different. The former is a TIMESTAMP WITH LOCAL TIME ZONE and the latter is a TIMESTAMP WITHOUT TIME ZONE Following [SPARK-47375](https://issues.apache.org/jira/browse/SPARK-47375), MySql TIMESTAMP goes directly to TimestampType, DATETIME's mapping is decided by preferTimestampNTZ. ### Why are the changes needed? align the guidelines for jdbc timestamps ### Does this PR introduce _any_ user-facing change? yes,migration guide provided ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? Closes #45530 from yaooqinn/SPARK-47406. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 14 +++ docs/sql-migration-guide.md| 1 + .../sql/execution/datasources/jdbc/JdbcUtils.scala | 10 +++-- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 7 .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 45 +- 5 files changed, 55 insertions(+), 22 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index 48b94cf28a63..b1d239337aa0 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql.jdbc import java.math.BigDecimal import java.sql.{Connection, Date, Timestamp} +import java.time.LocalDateTime import java.util.Properties import org.apache.spark.sql.Row @@ -134,6 +135,19 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { } } + test("SPARK-47406: MySQL datetime types with preferTimestampNTZ") { +withDefaultTimeZone(UTC) { + val df = sqlContext.read.option("preferTimestampNTZ", true) +.jdbc(jdbcUrl, "dates", new Properties) + checkAnswer(df, Row( +Date.valueOf("1991-11-09"), +LocalDateTime.of(1970, 1, 1, 13, 31, 24), +LocalDateTime.of(1996, 1, 1, 1, 23, 45), +Timestamp.valueOf("2009-02-13 23:31:30"), +Date.valueOf("2001-01-01"))) +} + } + test("String types") { val df = sqlContext.read.jdbc(jdbcUrl, "strings", new Properties) val rows = df.collect() diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 28fa19c351fc..27b62a6bd792 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -41,6 +41,7 @@ license: | - Since Spark 4.0, the SQL config `spark.sql.legacy.allowZeroIndexInFormatString` is deprecated. Consider to change `strfmt` of the `format_string` function to use 1-based indexes. The first argument must be referenced by "1$", the second by "2$", etc. - Since Spark 4.0, the function `to_csv` no longer supports input with the data type `STRUCT`, `ARRAY`, `MAP`, `VARIANT` and `BINARY` (because the `CSV specification` does not have standards for these data types and cannot be read back using `from_csv`), Spark will throw `DATATYPE_MISMATCH.UNSUPPORTED_INPUT_TYPE` exception. - Since Spark 4.0, JDBC read option `preferTimestampNTZ=true` will not convert Postgres TIMESTAMP WITH TIME ZONE and TIME WITH TIME ZONE data types to TimestampNTZType, which is available in Spark 3.5. +- Since Spark 4.0, JDBC read option `preferTimestampNTZ=true` will not convert MySQL TIMESTAMP to TimestampNTZType, which is available in Spark 3.5. MySQL DATETIME is not affected. ## Upgrading from Spark SQL 3.4 to 3.5 diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index b037d862fa1a..393f09b6075e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -167,6 +167,10 @@ object JdbcUtils extends Logging with SQLConfHelper { throw QueryExecutionErrors.cannotGetJdbcTypeError(dt)) } + def g