(spark) branch master updated: [SPARK-46009][SQL][FOLLOWUP] Remove unused PERCENTILE_CONT and PERCENTILE_DISC in g4
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ad63eef20617 [SPARK-46009][SQL][FOLLOWUP] Remove unused PERCENTILE_CONT and PERCENTILE_DISC in g4 ad63eef20617 is described below commit ad63eef20617db7cdecce465af54e4787d0deeac Author: beliefer AuthorDate: Wed May 1 11:25:54 2024 -0700 [SPARK-46009][SQL][FOLLOWUP] Remove unused PERCENTILE_CONT and PERCENTILE_DISC in g4 ### What changes were proposed in this pull request? This PR propose to remove unused `PERCENTILE_CONT` and `PERCENTILE_DISC` in g4 ### Why are the changes needed? https://github.com/apache/spark/pull/43910 merged the parse rule of `PercentileCont` and `PercentileDisc` into `functionCall`, but forgot to remove unused `PERCENTILE_CONT` and `PERCENTILE_DISC` in g4. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? GA. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #46272 from beliefer/SPARK-46009_followup2. Authored-by: beliefer Signed-off-by: Dongjoon Hyun --- docs/sql-ref-ansi-compliance.md| 2 - .../spark/sql/catalyst/parser/SqlBaseLexer.g4 | 2 - .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 2 - .../sql-tests/analyzer-results/window2.sql.out | 126 + .../sql-tests/results/ansi/keywords.sql.out| 4 - .../resources/sql-tests/results/keywords.sql.out | 2 - .../ThriftServerWithSparkContextSuite.scala| 2 +- 7 files changed, 127 insertions(+), 13 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index 011bd671ca1f..84416ffd5f83 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -608,8 +608,6 @@ Below is a list of all the keywords in Spark SQL. |PARTITIONED|non-reserved|non-reserved|non-reserved| |PARTITIONS|non-reserved|non-reserved|non-reserved| |PERCENT|non-reserved|non-reserved|non-reserved| -|PERCENTILE_CONT|reserved|non-reserved|non-reserved| -|PERCENTILE_DISC|reserved|non-reserved|non-reserved| |PIVOT|non-reserved|non-reserved|non-reserved| |PLACING|non-reserved|non-reserved|non-reserved| |POSITION|non-reserved|non-reserved|reserved| diff --git a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 index 83e40c4a20a2..86e16af7ff10 100644 --- a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 +++ b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 @@ -298,8 +298,6 @@ OVERWRITE: 'OVERWRITE'; PARTITION: 'PARTITION'; PARTITIONED: 'PARTITIONED'; PARTITIONS: 'PARTITIONS'; -PERCENTILE_CONT: 'PERCENTILE_CONT'; -PERCENTILE_DISC: 'PERCENTILE_DISC'; PERCENTLIT: 'PERCENT'; PIVOT: 'PIVOT'; PLACING: 'PLACING'; diff --git a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 index 71bd75f934ca..653224c5475f 100644 --- a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 +++ b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 @@ -1829,8 +1829,6 @@ nonReserved | PARTITION | PARTITIONED | PARTITIONS -| PERCENTILE_CONT -| PERCENTILE_DISC | PERCENTLIT | PIVOT | PLACING diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/window2.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/window2.sql.out new file mode 100644 index ..6fd41286959a --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/window2.sql.out @@ -0,0 +1,126 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES +(null, 1L, 1.0D, date("2017-08-01"), timestamp_seconds(1501545600), "a"), +(1, 1L, 1.0D, date("2017-08-01"), timestamp_seconds(1501545600), "a"), +(1, 2L, 2.5D, date("2017-08-02"), timestamp_seconds(150200), "a"), +(2, 2147483650L, 100.001D, date("2020-12-31"), timestamp_seconds(1609372800), "a"), +(1, null, 1.0D, date("2017-08-01"), timestamp_seconds(1501545600), "b"), +(2, 3L, 3.3D, date("2017-08-03"), timestamp_seconds(150300), "b"), +(3, 2147483650L, 100.001D, date("2020-12-31"), timestamp_seconds(1609372800), "b"), +(null, null, null, null, null, null), +(3, 1L, 1.0D, date("2017-08-01"), timestamp_seconds(1501545600), null) +AS testData(val, val_long, val_double, val_date, val_times
(spark) branch branch-3.5 updated: Revert "[SPARK-48016][SQL] Fix a bug in try_divide function when with decimals"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new fc0ef07f2949 Revert "[SPARK-48016][SQL] Fix a bug in try_divide function when with decimals" fc0ef07f2949 is described below commit fc0ef07f2949c399537c6d9b5fb7b81f546de212 Author: Dongjoon Hyun AuthorDate: Wed May 1 11:18:29 2024 -0700 Revert "[SPARK-48016][SQL] Fix a bug in try_divide function when with decimals" This reverts commit e78ee2c5770218a521340cb84f57a02dd00f7f3a. --- .../sql/catalyst/analysis/DecimalPrecision.scala | 14 ++--- .../spark/sql/catalyst/analysis/TypeCoercion.scala | 10 ++-- sql/core/src/test/resources/log4j2.properties | 2 +- .../analyzer-results/ansi/try_arithmetic.sql.out | 56 --- .../analyzer-results/try_arithmetic.sql.out| 56 --- .../resources/sql-tests/inputs/try_arithmetic.sql | 8 --- .../sql-tests/results/ansi/try_arithmetic.sql.out | 64 -- .../sql-tests/results/try_arithmetic.sql.out | 64 -- 8 files changed, 13 insertions(+), 261 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala index f51127f53b38..09cf61a77955 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala @@ -83,7 +83,7 @@ object DecimalPrecision extends TypeCoercionRule { val resultType = widerDecimalType(p1, s1, p2, s2) val newE1 = if (e1.dataType == resultType) e1 else Cast(e1, resultType) val newE2 = if (e2.dataType == resultType) e2 else Cast(e2, resultType) - b.withNewChildren(Seq(newE1, newE2)) + b.makeCopy(Array(newE1, newE2)) } /** @@ -202,21 +202,21 @@ object DecimalPrecision extends TypeCoercionRule { case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && l.dataType.isInstanceOf[IntegralType] && literalPickMinimumPrecision => - b.withNewChildren(Seq(Cast(l, DataTypeUtils.fromLiteral(l)), r)) + b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) case (l, r: Literal) if l.dataType.isInstanceOf[DecimalType] && r.dataType.isInstanceOf[IntegralType] && literalPickMinimumPrecision => - b.withNewChildren(Seq(l, Cast(r, DataTypeUtils.fromLiteral(r + b.makeCopy(Array(l, Cast(r, DataTypeUtils.fromLiteral(r // Promote integers inside a binary expression with fixed-precision decimals to decimals, // and fixed-precision decimals in an expression with floats / doubles to doubles case (l @ IntegralTypeExpression(), r @ DecimalExpression(_, _)) => - b.withNewChildren(Seq(Cast(l, DecimalType.forType(l.dataType)), r)) + b.makeCopy(Array(Cast(l, DecimalType.forType(l.dataType)), r)) case (l @ DecimalExpression(_, _), r @ IntegralTypeExpression()) => - b.withNewChildren(Seq(l, Cast(r, DecimalType.forType(r.dataType + b.makeCopy(Array(l, Cast(r, DecimalType.forType(r.dataType case (l, r @ DecimalExpression(_, _)) if isFloat(l.dataType) => - b.withNewChildren(Seq(l, Cast(r, DoubleType))) + b.makeCopy(Array(l, Cast(r, DoubleType))) case (l @ DecimalExpression(_, _), r) if isFloat(r.dataType) => - b.withNewChildren(Seq(Cast(l, DoubleType), r)) + b.makeCopy(Array(Cast(l, DoubleType), r)) case _ => b } } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala index c9a4a2d40246..190e72a8e669 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala @@ -1102,22 +1102,22 @@ object TypeCoercion extends TypeCoercionBase { case a @ BinaryArithmetic(left @ StringTypeExpression(), right) if right.dataType != CalendarIntervalType => -a.withNewChildren(Seq(Cast(left, DoubleType), right)) +a.makeCopy(Array(Cast(left, DoubleType), right)) case a @ BinaryArithmetic(left, right @ StringTypeExpression()) if left.dataType != CalendarIntervalType => -a.withNewChildren(Seq(left, Cast(right, DoubleType))) +a.makeCopy(Array(left, Cast(right, DoubleType))) // For equality between string and timestamp we cast the string to a timestam
(spark) branch branch-3.4 updated: [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 70ce67cc77cc [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter 70ce67cc77cc is described below commit 70ce67cc77ccce3a4509bba608dbab69b45cc2b9 Author: Dongjoon Hyun AuthorDate: Wed May 1 10:42:26 2024 -0700 [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter ### What changes were proposed in this pull request? This PR aims to fix `mypy` failure by propagating `lint-python`'s `PYTHON_EXECUTABLE` to `mypy`'s parameter correctly. ### Why are the changes needed? We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the following. That's not always guaranteed. We need to use `mypy`'s parameter to make it sure. https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705 This patch is useful whose `python3` chooses one of multiple Python installation like our CI environment. ``` $ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested root2ef6ce08d2c4:/# python3 --version Python 3.10.12 root2ef6ce08d2c4:/# python3.9 --version Python 3.9.19 ``` For example, the following shows that `PYTHON_EXECUTABLE` is not considered by `mypy`. ``` root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --python-executable=python3.11 --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l 3428 root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l 1 root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l 1 ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46314 from dongjoon-hyun/SPARK-48068. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 26c871f180306fbf86ce65f14f8e7a71f89885ed) Signed-off-by: Dongjoon Hyun --- dev/lint-python | 2 ++ 1 file changed, 2 insertions(+) diff --git a/dev/lint-python b/dev/lint-python index b5ee63e38690..9b60ca75eb9b 100755 --- a/dev/lint-python +++ b/dev/lint-python @@ -69,6 +69,7 @@ function mypy_annotation_test { echo "starting mypy annotations test..." MYPY_REPORT=$( ($MYPY_BUILD \ + --python-executable $PYTHON_EXECUTABLE \ --namespace-packages \ --config-file python/mypy.ini \ --cache-dir /tmp/.mypy_cache/ \ @@ -128,6 +129,7 @@ function mypy_examples_test { echo "starting mypy examples test..." MYPY_REPORT=$( (MYPYPATH=python $MYPY_BUILD \ + --python-executable $PYTHON_EXECUTABLE \ --namespace-packages \ --config-file python/mypy.ini \ --exclude "mllib/*" \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 953d7f90c6db [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter 953d7f90c6db is described below commit 953d7f90c6dbee597b0360c551dfac2a1d87d961 Author: Dongjoon Hyun AuthorDate: Wed May 1 10:42:26 2024 -0700 [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter ### What changes were proposed in this pull request? This PR aims to fix `mypy` failure by propagating `lint-python`'s `PYTHON_EXECUTABLE` to `mypy`'s parameter correctly. ### Why are the changes needed? We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the following. That's not always guaranteed. We need to use `mypy`'s parameter to make it sure. https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705 This patch is useful whose `python3` chooses one of multiple Python installation like our CI environment. ``` $ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested root2ef6ce08d2c4:/# python3 --version Python 3.10.12 root2ef6ce08d2c4:/# python3.9 --version Python 3.9.19 ``` For example, the following shows that `PYTHON_EXECUTABLE` is not considered by `mypy`. ``` root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --python-executable=python3.11 --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l 3428 root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l 1 root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l 1 ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46314 from dongjoon-hyun/SPARK-48068. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 26c871f180306fbf86ce65f14f8e7a71f89885ed) Signed-off-by: Dongjoon Hyun --- dev/lint-python | 2 ++ 1 file changed, 2 insertions(+) diff --git a/dev/lint-python b/dev/lint-python index d040493c86c4..7ccd32451acc 100755 --- a/dev/lint-python +++ b/dev/lint-python @@ -118,6 +118,7 @@ function mypy_annotation_test { echo "starting mypy annotations test..." MYPY_REPORT=$( ($MYPY_BUILD \ + --python-executable $PYTHON_EXECUTABLE \ --namespace-packages \ --config-file python/mypy.ini \ --cache-dir /tmp/.mypy_cache/ \ @@ -177,6 +178,7 @@ function mypy_examples_test { echo "starting mypy examples test..." MYPY_REPORT=$( (MYPYPATH=python $MYPY_BUILD \ + --python-executable $PYTHON_EXECUTABLE \ --namespace-packages \ --config-file python/mypy.ini \ --exclude "mllib/*" \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 26c871f18030 [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter 26c871f18030 is described below commit 26c871f180306fbf86ce65f14f8e7a71f89885ed Author: Dongjoon Hyun AuthorDate: Wed May 1 10:42:26 2024 -0700 [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter ### What changes were proposed in this pull request? This PR aims to fix `mypy` failure by propagating `lint-python`'s `PYTHON_EXECUTABLE` to `mypy`'s parameter correctly. ### Why are the changes needed? We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the following. That's not always guaranteed. We need to use `mypy`'s parameter to make it sure. https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705 This patch is useful whose `python3` chooses one of multiple Python installation like our CI environment. ``` $ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested root2ef6ce08d2c4:/# python3 --version Python 3.10.12 root2ef6ce08d2c4:/# python3.9 --version Python 3.9.19 ``` For example, the following shows that `PYTHON_EXECUTABLE` is not considered by `mypy`. ``` root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --python-executable=python3.11 --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l 3428 root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l 1 root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l 1 ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46314 from dongjoon-hyun/SPARK-48068. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/lint-python | 2 ++ 1 file changed, 2 insertions(+) diff --git a/dev/lint-python b/dev/lint-python index 6bd843103bd7..b8703310bc4b 100755 --- a/dev/lint-python +++ b/dev/lint-python @@ -125,6 +125,7 @@ function mypy_annotation_test { echo "starting mypy annotations test..." MYPY_REPORT=$( ($MYPY_BUILD \ + --python-executable $PYTHON_EXECUTABLE \ --namespace-packages \ --config-file python/mypy.ini \ --cache-dir /tmp/.mypy_cache/ \ @@ -184,6 +185,7 @@ function mypy_examples_test { echo "starting mypy examples test..." MYPY_REPORT=$( (MYPYPATH=python $MYPY_BUILD \ + --python-executable $PYTHON_EXECUTABLE \ --namespace-packages \ --config-file python/mypy.ini \ --exclude "mllib/*" \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48069][INFRA] Handle `PEP-632` by checking `ModuleNotFoundError` on `setuptools` in Python 3.12
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ff401dde5034 [SPARK-48069][INFRA] Handle `PEP-632` by checking `ModuleNotFoundError` on `setuptools` in Python 3.12 ff401dde5034 is described below commit ff401dde50343c9bbc1c49a0294272f2da7d01e2 Author: Dongjoon Hyun AuthorDate: Tue Apr 30 23:54:06 2024 -0700 [SPARK-48069][INFRA] Handle `PEP-632` by checking `ModuleNotFoundError` on `setuptools` in Python 3.12 ### What changes were proposed in this pull request? This PR aims to handle `PEP-632` by checking `ModuleNotFoundError` on `setuptools`. - [PEP 632 – Deprecate distutils module](https://peps.python.org/pep-0632/) ### Why are the changes needed? Use `Python 3.12`. ``` $ python3 --version Python 3.12.2 ``` **BEFORE** ``` $ dev/lint-python --mypy | grep ModuleNotFoundError Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'setuptools' ``` **AFTER** ``` $ dev/lint-python --mypy | grep ModuleNotFoundError ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and manual test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46315 from dongjoon-hyun/SPARK-48069. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/lint-python | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/dev/lint-python b/dev/lint-python index 8d587bd52aca..6bd843103bd7 100755 --- a/dev/lint-python +++ b/dev/lint-python @@ -84,7 +84,10 @@ function satisfies_min_version { local expected_version="$2" echo "$( "$PYTHON_EXECUTABLE" << EOM -from setuptools.extern.packaging import version +try: +from setuptools.extern.packaging import version +except ModuleNotFoundError: +from packaging import version print(version.parse('$provided_version') >= version.parse('$expected_version')) EOM )" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48016][SQL][TESTS][FOLLOWUP] Update Java 21 golden file
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 65cf5b18648a [SPARK-48016][SQL][TESTS][FOLLOWUP] Update Java 21 golden file 65cf5b18648a is described below commit 65cf5b18648a81fc9b0787d03f23f7465c20f3ec Author: Dongjoon Hyun AuthorDate: Tue Apr 30 22:42:02 2024 -0700 [SPARK-48016][SQL][TESTS][FOLLOWUP] Update Java 21 golden file ### What changes were proposed in this pull request? This is a follow-up of SPARK-48016 to update the missed Java 21 golden file. - #46286 ### Why are the changes needed? To recover Java 21 CIs: - https://github.com/apache/spark/actions/workflows/build_java21.yml - https://github.com/apache/spark/actions/workflows/build_maven_java21.yml - https://github.com/apache/spark/actions/workflows/build_maven_java21_macos14.yml ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual tests. I regenerated all in Java 21 and this was the only one affected. ``` $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46313 from dongjoon-hyun/SPARK-48016. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../results/try_arithmetic.sql.out.java21 | 64 ++ 1 file changed, 64 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21 b/sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21 index dcdb9d0dcb19..002a0dfcf37e 100644 --- a/sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21 +++ b/sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21 @@ -15,6 +15,22 @@ struct NULL +-- !query +SELECT try_add(2147483647, decimal(1)) +-- !query schema +struct +-- !query output +2147483648 + + +-- !query +SELECT try_add(2147483647, "1") +-- !query schema +struct +-- !query output +2.147483648E9 + + -- !query SELECT try_add(-2147483648, -1) -- !query schema @@ -249,6 +265,22 @@ struct NULL +-- !query +SELECT try_divide(1, decimal(0)) +-- !query schema +struct +-- !query output +NULL + + +-- !query +SELECT try_divide(1, "0") +-- !query schema +struct +-- !query output +NULL + + -- !query SELECT try_divide(interval 2 year, 2) -- !query schema @@ -313,6 +345,22 @@ struct NULL +-- !query +SELECT try_subtract(2147483647, decimal(-1)) +-- !query schema +struct +-- !query output +2147483648 + + +-- !query +SELECT try_subtract(2147483647, "-1") +-- !query schema +struct +-- !query output +2.147483648E9 + + -- !query SELECT try_subtract(-2147483648, 1) -- !query schema @@ -409,6 +457,22 @@ struct NULL +-- !query +SELECT try_multiply(2147483647, decimal(-2)) +-- !query schema +struct +-- !query output +-4294967294 + + +-- !query +SELECT try_multiply(2147483647, "-2") +-- !query schema +struct +-- !query output +-4.294967294E9 + + -- !query SELECT try_multiply(-2147483648, 2) -- !query schema - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 02206cd66dbf [SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags 02206cd66dbf is described below commit 02206cd66dbfc8de602a685b032f1805bcf8e36f Author: Nick Young AuthorDate: Tue Apr 30 22:07:20 2024 -0700 [SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags ### What changes were proposed in this pull request? - Changed the `tags` variable of the `TreeNode` class to initialize lazily. This will reduce unnecessary driver memory pressure. ### Why are the changes needed? - Plans with large expression or operator trees are known to cause driver memory pressure; this is one step in alleviating that issue. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UT covers behavior. Outwards facing behavior does not change. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46285 from n-young-db/treenode-tags. Authored-by: Nick Young Signed-off-by: Dongjoon Hyun --- .../apache/spark/sql/catalyst/trees/TreeNode.scala | 24 ++ 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala index 94e893d468b3..dd39f3182bfb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala @@ -78,8 +78,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] /** * A mutable map for holding auxiliary information of this tree node. It will be carried over * when this node is copied via `makeCopy`, or transformed via `transformUp`/`transformDown`. + * We lazily evaluate the `tags` since the default size of a `mutable.Map` is nonzero. This + * will reduce unnecessary memory pressure. */ - private val tags: mutable.Map[TreeNodeTag[_], Any] = mutable.Map.empty + private[this] var _tags: mutable.Map[TreeNodeTag[_], Any] = null + private def tags: mutable.Map[TreeNodeTag[_], Any] = { +if (_tags eq null) { + _tags = mutable.Map.empty +} +_tags + } /** * Default tree pattern [[BitSet] for a [[TreeNode]]. @@ -147,11 +155,13 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] ineffectiveRules.get(ruleId.id) } + def isTagsEmpty: Boolean = (_tags eq null) || _tags.isEmpty + def copyTagsFrom(other: BaseType): Unit = { // SPARK-32753: it only makes sense to copy tags to a new node // but it's too expensive to detect other cases likes node removal // so we make a compromise here to copy tags to node with no tags -if (tags.isEmpty) { +if (isTagsEmpty && !other.isTagsEmpty) { tags ++= other.tags } } @@ -161,11 +171,17 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] } def getTagValue[T](tag: TreeNodeTag[T]): Option[T] = { -tags.get(tag).map(_.asInstanceOf[T]) +if (isTagsEmpty) { + None +} else { + tags.get(tag).map(_.asInstanceOf[T]) +} } def unsetTagValue[T](tag: TreeNodeTag[T]): Unit = { -tags -= tag +if (!isTagsEmpty) { + tags -= tag +} } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48063][CORE] Enable `spark.stage.ignoreDecommissionFetchFailure` by default
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f3cc8f930383 [SPARK-48063][CORE] Enable `spark.stage.ignoreDecommissionFetchFailure` by default f3cc8f930383 is described below commit f3cc8f930383659b9f99e56b38de4b97d588e20b Author: Dongjoon Hyun AuthorDate: Tue Apr 30 15:19:00 2024 -0700 [SPARK-48063][CORE] Enable `spark.stage.ignoreDecommissionFetchFailure` by default ### What changes were proposed in this pull request? This PR aims to **enable `spark.stage.ignoreDecommissionFetchFailure` by default** while keeping `spark.scheduler.maxRetainedRemovedDecommissionExecutors=0` without any change for Apache Spark 4.0.0 in order to help a user use this feature more easily by setting only one configuration, `spark.scheduler.maxRetainedRemovedDecommissionExecutors`. ### Why are the changes needed? This feature was added at Apache Spark 3.4.0 via SPARK-40481 and SPARK-40979 and has been used for two years to support executor decommissioning features in the production. - #37924 - #38441 ### Does this PR introduce _any_ user-facing change? No because `spark.scheduler.maxRetainedRemovedDecommissionExecutors` is still `0`. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46308 from dongjoon-hyun/SPARK-48063. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- docs/configuration.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index b2cbb6f6deb6..2e207422ae06 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -2403,7 +2403,7 @@ package object config { s"count ${STAGE_MAX_CONSECUTIVE_ATTEMPTS.key}") .version("3.4.0") .booleanConf - .createWithDefault(false) + .createWithDefault(true) private[spark] val SCHEDULER_MAX_RETAINED_REMOVED_EXECUTORS = ConfigBuilder("spark.scheduler.maxRetainedRemovedDecommissionExecutors") diff --git a/docs/configuration.md b/docs/configuration.md index d5e2a569fdea..2e612ffd9ab9 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -3072,7 +3072,7 @@ Apart from these, the following properties are also available, and may be useful spark.stage.ignoreDecommissionFetchFailure - false + true Whether ignore stage fetch failure caused by executor decommission when count spark.stage.maxConsecutiveAttempts - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48060][SS][TESTS] Fix `StreamingQueryHashPartitionVerifySuite` to update golden files correctly
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new faab553cac70 [SPARK-48060][SS][TESTS] Fix `StreamingQueryHashPartitionVerifySuite` to update golden files correctly faab553cac70 is described below commit faab553cac70eefeec286b1823b70ad62bed87f8 Author: Dongjoon Hyun AuthorDate: Tue Apr 30 12:50:07 2024 -0700 [SPARK-48060][SS][TESTS] Fix `StreamingQueryHashPartitionVerifySuite` to update golden files correctly ### What changes were proposed in this pull request? This PR aims to fix `StreamingQueryHashPartitionVerifySuite` to update golden files correctly. - The documentation is added. - Newly generated files are updated. ### Why are the changes needed? Previously, `SPARK_GENERATE_GOLDEN_FILES` doesn't work as expected because it updates the files under `target` directory. We need to update `src/test` files. **BEFORE** ``` $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite" $ git status On branch master Your branch is up to date with 'apache/master'. nothing to commit, working tree clean ``` **AFTER** ``` $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite" \ -Dspark.sql.test.randomDataGenerator.maxStrLen=100 \ -Dspark.sql.test.randomDataGenerator.maxArraySize=4 $ git status On branch SPARK-48060 Your branch is up to date with 'dongjoon/SPARK-48060'. Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas modified: sql/core/src/test/resources/structured-streaming/partition-tests/rowsAndPartIds no changes added to commit (use "git add" and/or "git commit -a") ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. I regenerate the data like the following. ``` $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite" \ -Dspark.sql.test.randomDataGenerator.maxStrLen=100 \ -Dspark.sql.test.randomDataGenerator.maxArraySize=4 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46304 from dongjoon-hyun/SPARK-48060. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../partition-tests/randomSchemas | 2 +- .../partition-tests/rowsAndPartIds | Bin 4862115 -> 13341426 bytes .../StreamingQueryHashPartitionVerifySuite.scala | 22 +++-- 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas b/sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas index 8d6ff942610c..f6eadd776cc6 100644 --- a/sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas +++ b/sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas @@ -1 +1 @@ -col_0 STRUCT NOT NULL, col_3: FLOAT NOT NULL, col_4: INT NOT NULL>,col_1 STRUCT, col_3: ARRAY NOT NULL, col_4: ARRAY, col_5: TIMESTAMP NOT NULL, col_6: STRUCT, col_1: BIGINT NOT NULL> NOT NULL, col_7: ARRAY NOT NULL, col_8: ARRAY, col_9: BIGINT NOT NULL> NOT NULL,col_2 BIGINT NOT NULL,col_3 STRUCT,col_1 STRUCT NOT NULL,col_2 STRING NOT NULL,col_3 STRUCT, col_2: ARRAY NOT NULL> NOT NULL,col_4 BINARY NOT NULL,col_5 ARRAY NOT NULL,col_6 ARRAY,col_7 DOUBLE NOT NULL,col_8 ARRAY NOT NULL,col_9 ARRAY,col_10 FLOAT NOT NULL,col_11 STRUCT NOT NULL>, col_1: STRUCT NOT NULL, col_1: INT, col_2: STRUCT
(spark) branch master updated: [SPARK-48057][PYTHON][CONNECT][TESTS] Enable `GroupedApplyInPandasTests.test_grouped_with_empty_partition`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dab20b31388b [SPARK-48057][PYTHON][CONNECT][TESTS] Enable `GroupedApplyInPandasTests.test_grouped_with_empty_partition` dab20b31388b is described below commit dab20b31388ba7bcd2ab4d4424cbbd072bf84c30 Author: Ruifeng Zheng AuthorDate: Tue Apr 30 12:19:18 2024 -0700 [SPARK-48057][PYTHON][CONNECT][TESTS] Enable `GroupedApplyInPandasTests.test_grouped_with_empty_partition` ### What changes were proposed in this pull request? Enable `GroupedApplyInPandasTests. test_grouped_with_empty_partition` ### Why are the changes needed? test coverage ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #46299 from zhengruifeng/fix_test_grouped_with_empty_partition. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py | 4 python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py | 4 ++-- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py b/python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py index 1cc4ce012623..8a1da440c799 100644 --- a/python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py +++ b/python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py @@ -38,10 +38,6 @@ class GroupedApplyInPandasTests(GroupedApplyInPandasTestsMixin, ReusedConnectTes def test_apply_in_pandas_returning_incompatible_type(self): super().test_apply_in_pandas_returning_incompatible_type() -@unittest.skip("Spark Connect doesn't support RDD but the test depends on it.") -def test_grouped_with_empty_partition(self): -super().test_grouped_with_empty_partition() - if __name__ == "__main__": from pyspark.sql.tests.connect.test_parity_pandas_grouped_map import * # noqa: F401 diff --git a/python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py b/python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py index f43dafc0a4a1..1e86e12eb74f 100644 --- a/python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py +++ b/python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py @@ -680,13 +680,13 @@ class GroupedApplyInPandasTestsMixin: data = [Row(id=1, x=2), Row(id=1, x=3), Row(id=2, x=4)] expected = [Row(id=1, x=5), Row(id=1, x=5), Row(id=2, x=4)] num_parts = len(data) + 1 -df = self.spark.createDataFrame(self.sc.parallelize(data, numSlices=num_parts)) +df = self.spark.createDataFrame(data).repartition(num_parts) f = pandas_udf( lambda pdf: pdf.assign(x=pdf["x"].sum()), "id long, x int", PandasUDFType.GROUPED_MAP ) -result = df.groupBy("id").apply(f).collect() +result = df.groupBy("id").apply(f).sort("id").collect() self.assertEqual(result, expected) def test_grouped_over_window(self): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (0329479acb67 -> 9caa6f7f8b8e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0329479acb67 [SPARK-47359][SQL] Support TRANSLATE function to work with collated strings add 9caa6f7f8b8e [SPARK-48061][SQL][TESTS] Parameterize max limits of `spark.sql.test.randomDataGenerator` No new revisions were added by this update. Summary of changes: .../test/scala/org/apache/spark/sql/RandomDataGenerator.scala| 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46122][SQL] Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9e8c4aa3f43a [SPARK-46122][SQL] Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default 9e8c4aa3f43a is described below commit 9e8c4aa3f43a3d99bff56cca319db623abc473ee Author: Dongjoon Hyun AuthorDate: Tue Apr 30 01:44:37 2024 -0700 [SPARK-46122][SQL] Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default ### What changes were proposed in this pull request? This PR aims to switch `spark.sql.legacy.createHiveTableByDefault` to `false` by default in order to move away from this legacy behavior from `Apache Spark 4.0.0` while the legacy functionality will be preserved during Apache Spark 4.x period by setting `spark.sql.legacy.createHiveTableByDefault=true`. ### Why are the changes needed? Historically, this behavior change was merged at `Apache Spark 3.0.0` activity in SPARK-30098 and reverted officially during the `3.0.0 RC` period. - 2019-12-06: #26736 (58be82a) - 2019-12-06: https://lists.apache.org/thread/g90dz1og1zt4rr5h091rn1zqo50y759j - 2020-05-16: #28517 At `Apache Spark 3.1.0`, we had another discussion and defined it as `Legacy` behavior via a new configuration by reusing the JIRA ID, SPARK-30098. - 2020-12-01: https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204 - 2020-12-03: #30554 Last year, this was proposed again twice and `Apache Spark 4.0.0` is a good time to make a decision for Apache Spark future direction. - SPARK-42603 on 2023-02-27 as an independent idea. - SPARK-46122 on 2023-11-27 as a part of Apache Spark 4.0.0 idea ### Does this PR introduce _any_ user-facing change? Yes, the migration document is updated. ### How was this patch tested? Pass the CIs with the adjusted test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46207 from dongjoon-hyun/SPARK-46122. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- docs/sql-migration-guide.md | 1 + python/pyspark/sql/tests/test_readwriter.py | 5 ++--- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +- .../apache/spark/sql/execution/command/PlanResolutionSuite.scala | 8 +++- 4 files changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 1e0fdadde1e3..07562babc87d 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -25,6 +25,7 @@ license: | ## Upgrading from Spark SQL 3.5 to 4.0 - Since Spark 4.0, `spark.sql.ansi.enabled` is on by default. To restore the previous behavior, set `spark.sql.ansi.enabled` to `false` or `SPARK_ANSI_SQL_MODE` to `false`. +- Since Spark 4.0, `CREATE TABLE` syntax without `USING` and `STORED AS` will use the value of `spark.sql.sources.default` as the table provider instead of `Hive`. To restore the previous behavior, set `spark.sql.legacy.createHiveTableByDefault` to `true`. - Since Spark 4.0, the default behaviour when inserting elements in a map is changed to first normalize keys -0.0 to 0.0. The affected SQL functions are `create_map`, `map_from_arrays`, `map_from_entries`, and `map_concat`. To restore the previous behaviour, set `spark.sql.legacy.disableMapKeyNormalization` to `true`. - Since Spark 4.0, the default value of `spark.sql.maxSinglePartitionBytes` is changed from `Long.MaxValue` to `128m`. To restore the previous behavior, set `spark.sql.maxSinglePartitionBytes` to `9223372036854775807`(`Long.MaxValue`). - Since Spark 4.0, any read of SQL tables takes into consideration the SQL configs `spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` instead of the core config `spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`. diff --git a/python/pyspark/sql/tests/test_readwriter.py b/python/pyspark/sql/tests/test_readwriter.py index 5784d2c72973..e752856d0316 100644 --- a/python/pyspark/sql/tests/test_readwriter.py +++ b/python/pyspark/sql/tests/test_readwriter.py @@ -247,10 +247,9 @@ class ReadwriterV2TestsMixin: def test_create_without_provider(self): df = self.df -with self.assertRaisesRegex( -AnalysisException, "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT" -): +with self.table("test_table"): df.writeTo("test_table").create() +self.assertEqual(100, self.spark.sql("select * from test_table").count()) def test_table_overwrite(self): df = self.df diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.
(spark) branch master updated: [SPARK-48042][SQL] Use a timestamp formatter with timezone at class level instead of making copies at method level
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c9ed9dfccb72 [SPARK-48042][SQL] Use a timestamp formatter with timezone at class level instead of making copies at method level c9ed9dfccb72 is described below commit c9ed9dfccb72bc8d30557dcd2809c298a75c3f69 Author: Kent Yao AuthorDate: Mon Apr 29 11:13:39 2024 -0700 [SPARK-48042][SQL] Use a timestamp formatter with timezone at class level instead of making copies at method level ### What changes were proposed in this pull request? This PR creates a timestamp formatter with the timezone directly for formatting. Previously, we called `withZone` for every value in the `format` function. Because the original `zoneId` in the formatter is null and never equals the one we pass in, it creates new copies of the formatter over and over. ```java ... * * param zone the new override zone, null if no override * return a formatter based on this formatter with the requested override zone, not null */ public DateTimeFormatter withZone(ZoneId zone) { if (Objects.equals(this.zone, zone)) { return this; } return new DateTimeFormatter(printerParser, locale, decimalStyle, resolverStyle, resolverFields, chrono, zone); } ``` ### Why are the changes needed? improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - Existing tests - I also ran the DateTimeBenchmark result locally, there's no performance gain at least for these cases. ### Was this patch authored or co-authored using generative AI tooling? no Closes #46282 from yaooqinn/SPARK-48042. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/catalyst/util/TimestampFormatter.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala index d59b52a3818a..9f57f8375c54 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala @@ -162,6 +162,9 @@ class Iso8601TimestampFormatter( protected lazy val formatter: DateTimeFormatter = getOrCreateFormatter(pattern, locale, isParsing) + @transient + private lazy val zonedFormatter: DateTimeFormatter = formatter.withZone(zoneId) + @transient protected lazy val legacyFormatter = TimestampFormatter.getLegacyFormatter( pattern, zoneId, locale, legacyFormat) @@ -231,7 +234,7 @@ class Iso8601TimestampFormatter( override def format(instant: Instant): String = { try { - formatter.withZone(zoneId).format(instant) + zonedFormatter.format(instant) } catch checkFormattedDiff(toJavaTimestamp(instantToMicros(instant)), (t: Timestamp) => format(t)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (f781d153a5e4 -> c35a21e5984f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f781d153a5e4 [SPARK-48046][K8S] Remove `clock` parameter from `DriverServiceFeatureStep` add c35a21e5984f [SPARK-48044][PYTHON][CONNECT] Cache `DataFrame.isStreaming` No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/dataframe.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (d42c10d9411d -> f781d153a5e4)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d42c10d9411d [SPARK-47693][TESTS][FOLLOWUP] Reduce CollationBenchmarks time add f781d153a5e4 [SPARK-48046][K8S] Remove `clock` parameter from `DriverServiceFeatureStep` No new revisions were added by this update. Summary of changes: .../apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala | 4 +--- .../spark/deploy/k8s/features/DriverServiceFeatureStepSuite.scala | 2 +- 2 files changed, 2 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (ccb0eb699f7c -> d42c10d9411d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ccb0eb699f7c [SPARK-48038][K8S] Promote driverServiceName to KubernetesDriverConf add d42c10d9411d [SPARK-47693][TESTS][FOLLOWUP] Reduce CollationBenchmarks time No new revisions were added by this update. Summary of changes: .../execution/benchmark/CollationBenchmark.scala | 38 -- 1 file changed, 20 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48038][K8S] Promote driverServiceName to KubernetesDriverConf
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ccb0eb699f7c [SPARK-48038][K8S] Promote driverServiceName to KubernetesDriverConf ccb0eb699f7c is described below commit ccb0eb699f7c54aa3902d1ebbb34684693b563de Author: Cheng Pan AuthorDate: Mon Apr 29 08:35:13 2024 -0700 [SPARK-48038][K8S] Promote driverServiceName to KubernetesDriverConf ### What changes were proposed in this pull request? Promote `driverServiceName` from `DriverServiceFeatureStep` to `KubernetesDriverConf`. ### Why are the changes needed? To allow other feature steps, e.g. ingress(proposed in SPARK-47954), to access `driverServiceName`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? UT has been updated. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46276 from pan3793/SPARK-48038. Authored-by: Cheng Pan Signed-off-by: Dongjoon Hyun --- .../apache/spark/deploy/k8s/KubernetesConf.scala | 22 +++--- .../k8s/features/DriverServiceFeatureStep.scala| 14 ++ .../spark/deploy/k8s/KubernetesTestConf.scala | 6 -- .../features/DriverServiceFeatureStepSuite.scala | 17 + 4 files changed, 34 insertions(+), 25 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala index b55f9317d10b..fda772b737fe 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala @@ -24,12 +24,13 @@ import org.apache.commons.lang3.StringUtils import org.apache.spark.{SPARK_VERSION, SparkConf} import org.apache.spark.deploy.k8s.Config._ import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.DriverServiceFeatureStep._ import org.apache.spark.deploy.k8s.submit._ import org.apache.spark.internal.{Logging, MDC} import org.apache.spark.internal.LogKeys.{CONFIG, EXECUTOR_ENV_REGEX} import org.apache.spark.internal.config.ConfigEntry import org.apache.spark.resource.ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID -import org.apache.spark.util.Utils +import org.apache.spark.util.{Clock, SystemClock, Utils} /** * Structure containing metadata for Kubernetes logic to build Spark pods. @@ -83,12 +84,27 @@ private[spark] class KubernetesDriverConf( val mainAppResource: MainAppResource, val mainClass: String, val appArgs: Array[String], -val proxyUser: Option[String]) - extends KubernetesConf(sparkConf) { +val proxyUser: Option[String], +clock: Clock = new SystemClock()) + extends KubernetesConf(sparkConf) with Logging { def driverNodeSelector: Map[String, String] = KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_DRIVER_NODE_SELECTOR_PREFIX) + lazy val driverServiceName: String = { +val preferredServiceName = s"$resourceNamePrefix$DRIVER_SVC_POSTFIX" +if (preferredServiceName.length <= MAX_SERVICE_NAME_LENGTH) { + preferredServiceName +} else { + val randomServiceId = KubernetesUtils.uniqueID(clock) + val shorterServiceName = s"spark-$randomServiceId$DRIVER_SVC_POSTFIX" + logWarning(s"Driver's hostname would preferably be $preferredServiceName, but this is " + +s"too long (must be <= $MAX_SERVICE_NAME_LENGTH characters). Falling back to use " + +s"$shorterServiceName as the driver service's name.") + shorterServiceName +} + } + override val resourceNamePrefix: String = { val custom = if (Utils.isTesting) get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None custom.getOrElse(KubernetesConf.getResourceNamePrefix(appName)) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala index cba4f442371c..9adfb2b8de49 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala @@ -20,7 +20,7 @@ import scala.jdk.CollectionConverters._ import io.fabric8.kubernetes.api.model.{HasMetadata, ServiceBuilder} -import org.apache.spark.deploy.k8s.{KubernetesDriverConf, KubernetesUtils, SparkPod} +import org
(spark) branch master updated: [MINOR][DOCS] Remove space in the middle of configuration name in Arrow-optimized Python UDF page
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ff0751a56f01 [MINOR][DOCS] Remove space in the middle of configuration name in Arrow-optimized Python UDF page ff0751a56f01 is described below commit ff0751a56f010a6bf8a9ae86ddf0868bee615848 Author: Hyukjin Kwon AuthorDate: Sun Apr 28 22:34:30 2024 -0700 [MINOR][DOCS] Remove space in the middle of configuration name in Arrow-optimized Python UDF page ### What changes were proposed in this pull request? This PR removes a space in the middle of configuration name in Arrow-optimized Python UDF page. ![Screenshot 2024-04-29 at 1 53 42 PM](https://github.com/apache/spark/assets/6477701/46b7c448-fb30-4838-a5ba-c8f1c23398fd) https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html#arrow-python-udfs ### Why are the changes needed? So users can copy and paste the configuration names properly. ### Does this PR introduce _any_ user-facing change? Yes it fixes the doc. ### How was this patch tested? Manually built the docs, and checked. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46274 from HyukjinKwon/fix-minor-typo. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- python/docs/source/user_guide/sql/arrow_pandas.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/python/docs/source/user_guide/sql/arrow_pandas.rst b/python/docs/source/user_guide/sql/arrow_pandas.rst index a5dfb9aa4e52..1d6a4df60690 100644 --- a/python/docs/source/user_guide/sql/arrow_pandas.rst +++ b/python/docs/source/user_guide/sql/arrow_pandas.rst @@ -339,9 +339,9 @@ Arrow Python UDFs Arrow Python UDFs are user defined functions that are executed row-by-row, utilizing Arrow for efficient batch data transfer and serialization. To define an Arrow Python UDF, you can use the :meth:`udf` decorator or wrap the function with the :meth:`udf` method, ensuring the ``useArrow`` parameter is set to True. Additionally, you can enable Arrow -optimization for Python UDFs throughout the entire SparkSession by setting the Spark configuration ``spark.sql -.execution.pythonUDF.arrow.enabled`` to true. It's important to note that the Spark configuration takes effect only -when ``useArrow`` is either not set or set to None. +optimization for Python UDFs throughout the entire SparkSession by setting the Spark configuration +``spark.sql.execution.pythonUDF.arrow.enabled`` to true. It's important to note that the Spark configuration takes +effect only when ``useArrow`` is either not set or set to None. The type hints for Arrow Python UDFs should be specified in the same way as for default, pickled Python UDFs. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (9a42610d5ad8 -> e1445e3f1cf5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9a42610d5ad8 [SPARK-48029][INFRA] Update the packages name removed in building the spark docker image add e1445e3f1cf5 [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md | 14 ++ docs/sql-ref-identifier.md | 2 +- 2 files changed, 7 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48029][INFRA] Update the packages name removed in building the spark docker image
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9a42610d5ad8 [SPARK-48029][INFRA] Update the packages name removed in building the spark docker image 9a42610d5ad8 is described below commit 9a42610d5ad8ae0ded92fb68c7617861cfe975e1 Author: panbingkun AuthorDate: Sun Apr 28 21:43:47 2024 -0700 [SPARK-48029][INFRA] Update the packages name removed in building the spark docker image ### What changes were proposed in this pull request? The pr aims to update the packages name removed in building the spark docker image. ### Why are the changes needed? When our default image base was switched from `ubuntu 20.04` to `ubuntu 22.04`, the unused installation package in the base image has changed, in order to eliminate some warnings in building images and free disk space more accurately, we need to correct it. Before: ``` #35 [29/31] RUN apt-get remove --purge -y '^aspnet.*' '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*' snapd google-chrome-stable microsoft-edge-stable firefox azure-cli google-cloud-sdk mono-devel powershell libgl1-mesa-dri || true #35 0.489 Reading package lists... #35 0.505 Building dependency tree... #35 0.507 Reading state information... #35 0.511 E: Unable to locate package ^aspnet.* #35 0.511 E: Couldn't find any package by glob '^aspnet.*' #35 0.511 E: Couldn't find any package by regex '^aspnet.*' #35 0.511 E: Unable to locate package ^dotnet-.* #35 0.511 E: Couldn't find any package by glob '^dotnet-.*' #35 0.511 E: Couldn't find any package by regex '^dotnet-.*' #35 0.511 E: Unable to locate package ^llvm-.* #35 0.511 E: Couldn't find any package by glob '^llvm-.*' #35 0.511 E: Couldn't find any package by regex '^llvm-.*' #35 0.511 E: Unable to locate package ^mongodb-.* #35 0.511 E: Couldn't find any package by glob '^mongodb-.*' #35 0.511 EPackage 'php-crypt-gpg' is not installed, so not removed #35 0.511 Package 'php' is not installed, so not removed #35 0.511 : Couldn't find any package by regex '^mongodb-.*' #35 0.511 E: Unable to locate package snapd #35 0.511 E: Unable to locate package google-chrome-stable #35 0.511 E: Unable to locate package microsoft-edge-stable #35 0.511 E: Unable to locate package firefox #35 0.511 E: Unable to locate package azure-cli #35 0.511 E: Unable to locate package google-cloud-sdk #35 0.511 E: Unable to locate package mono-devel #35 0.511 E: Unable to locate package powershell #35 DONE 0.5s #36 [30/31] RUN apt-get autoremove --purge -y #36 0.063 Reading package lists... #36 0.079 Building dependency tree... #36 0.082 Reading state information... #36 0.088 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. #36 DONE 0.4s ``` After: ``` #38 [32/36] RUN apt-get remove --purge -y 'gfortran-11' 'humanity-icon-theme' 'nodejs-doc' || true #38 0.066 Reading package lists... #38 0.087 Building dependency tree... #38 0.089 Reading state information... #38 0.094 The following packages were automatically installed and are no longer required: #38 0.094 at-spi2-core bzip2-doc dbus-user-session dconf-gsettings-backend #38 0.095 dconf-service gsettings-desktop-schemas gtk-update-icon-cache #38 0.095 hicolor-icon-theme libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data #38 0.095 libatspi2.0-0 libbz2-dev libcairo-gobject2 libcolord2 libdconf1 libepoxy0 #38 0.095 libgfortran-11-dev libgtk-3-common libjs-highlight.js libllvm11 #38 0.095 libncurses-dev libncurses5-dev libphobos2-ldc-shared98 libreadline-dev #38 0.095 librsvg2-2 librsvg2-common libvte-2.91-common libwayland-client0 #38 0.095 libwayland-cursor0 libwayland-egl1 libxdamage1 libxkbcommon0 #38 0.095 session-migration tilix-common xkb-data #38 0.095 Use 'apt autoremove' to remove them. #38 0.096 The following packages will be REMOVED: #38 0.096 adwaita-icon-theme* gfortran* gfortran-11* humanity-icon-theme* libgtk-3-0* #38 0.096 libgtk-3-bin* libgtkd-3-0* libvte-2.91-0* libvted-3-0* nodejs-doc* #38 0.096 r-base-dev* tilix* ubuntu-mono* #38 0.248 0 upgraded, 0 newly installed, 13 to remove and 0 not upgraded. #38 0.248 After this operation, 99.6 MB disk space will be freed. ... (Reading database ... 70597 files and directories currently installed.) #38 0.304 Removing r-base-dev (4.1.2-1ubuntu2) ... #38 0.319 Removing gfortran (4:11.2.0-1ubuntu1) ... #38 0.340 Removing gfortran-11 (11.4.0-1ubuntu1~22.04) ... #38 0.356 Removing tilix (1.9.4-2build1) ... #38 0.377 Removing libvted-3-0:amd64 (3.10.0-1ubuntu1) ... #38 0.392 Removing libvte-2.91-0
(spark) branch master updated (3d62dd72a58f -> 8f1634e833ce)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3d62dd72a58f [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels add 8f1634e833ce [SPARK-48032][BUILD] Upgrade `commons-codec` to 1.17.0 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3d62dd72a58f [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels 3d62dd72a58f is described below commit 3d62dd72a58f5a19e9a371acc09604ab9ceb9e68 Author: Xi Chen AuthorDate: Sun Apr 28 18:30:06 2024 -0700 [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels ### What changes were proposed in this pull request? Currently, only the pod annotations supports `APP_ID` and `EXECUTOR_ID` placeholders. This commit aims to add the same function to pod labels. ### Why are the changes needed? The use case is to support using customized labels for availability zone based topology pod affinity. We want to use the Spark application ID as the customized label value, to allow Spark executor pods to run in the same availability zone as Spark driver pod. Although we can use the Spark internal label `spark-app-selector` directly, this is not a good practice when using it along with YuniKorn Gang Scheduling. When Gang Scheduling is enabled, the YuniKorn placeholder pods should use the same affinity as real Spark pods. In this way, we have to add the internal `spark-app-selector` label to the placeholder pods. This is not good because the placeholder pods could be recognized as Spark pods in the monitoring system. Thus we propose supporting the `APP_ID` and `EXECUTOR_ID` placeholders in Spark pod labels as well for flexibility. ### Does this PR introduce _any_ user-facing change? No because the pattern strings are very specific. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46149 from jshmchenxi/SPARK-47730/support-app-placeholder-in-labels. Authored-by: Xi Chen Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/deploy/k8s/KubernetesConf.scala | 10 ++ .../org/apache/spark/deploy/k8s/KubernetesConfSuite.scala | 13 ++--- .../deploy/k8s/features/BasicDriverFeatureStepSuite.scala | 11 +++ .../spark/deploy/k8s/integrationtest/BasicTestsSuite.scala | 6 -- .../spark/deploy/k8s/integrationtest/KubernetesSuite.scala | 6 -- 5 files changed, 31 insertions(+), 15 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala index a1ef04f4e311..b55f9317d10b 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala @@ -100,8 +100,9 @@ private[spark] class KubernetesDriverConf( SPARK_APP_ID_LABEL -> appId, SPARK_APP_NAME_LABEL -> KubernetesConf.getAppNameLabel(appName), SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE) -val driverCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs( - sparkConf, KUBERNETES_DRIVER_LABEL_PREFIX) +val driverCustomLabels = + KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_DRIVER_LABEL_PREFIX) +.map { case(k, v) => (k, Utils.substituteAppNExecIds(v, appId, "")) } presetLabels.keys.foreach { key => require( @@ -173,8 +174,9 @@ private[spark] class KubernetesExecutorConf( SPARK_ROLE_LABEL -> SPARK_POD_EXECUTOR_ROLE, SPARK_RESOURCE_PROFILE_ID_LABEL -> resourceProfileId.toString) -val executorCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs( - sparkConf, KUBERNETES_EXECUTOR_LABEL_PREFIX) +val executorCustomLabels = + KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_EXECUTOR_LABEL_PREFIX) +.map { case(k, v) => (k, Utils.substituteAppNExecIds(v, appId, executorId)) } presetLabels.keys.foreach { key => require( diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala index 9963db016ad9..3c53e9b74f92 100644 --- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala +++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala @@ -40,7 +40,9 @@ class KubernetesConfSuite extends SparkFunSuite { "execNodeSelectorKey2" -> "execNodeSelectorValue2") private val CUSTOM_LABELS = Map( "customLabel1Key" -> "customLabe
(spark) branch master updated: [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 64d321926bbc [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` 64d321926bbc is described below commit 64d321926bbcede05d1c145405d503b3431f185b Author: panbingkun AuthorDate: Sat Apr 27 17:38:55 2024 -0700 [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` ### What changes were proposed in this pull request? The pr aims to: - add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` - remove `jdk.incubator.foreign` and `-Dforeign.restricted=warn` from `SparkBuild.scala` ### Why are the changes needed? 1.`jdk.incubator.vector` First introduction: https://github.com/apache/spark/pull/30810 https://github.com/apache/spark/pull/30810/files#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3e https://github.com/apache/spark/assets/15246973/6ac7919a-5d82-475c-b8a2-7d9de71acacc;> Why should we add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`, Because when we only add `--add-modules=jdk.incubator.vector` to `SparkBuild.scala`, it will only take effect when compiling, as follows: ``` build/sbt "mllib-local/Test/runMain org.apache.spark.ml.linalg.BLASBenchmark" ... ``` https://github.com/apache/spark/assets/15246973/54d5f55f-cefe-4126-b255-69488f8699a6;> However, when we use `spark-submit`, it is as follows: ``` ./bin/spark-submit --class org.apache.spark.ml.linalg.BLASBenchmark /Users/panbingkun/Developer/spark/spark-community/mllib-local/target/scala-2.13/spark-mllib-local_2.13-4.0.0-SNAPSHOT-tests.jar ``` https://github.com/apache/spark/assets/15246973/8e02fa93-fef4-4cdc-96bd-908b3e9baea1;> Obviously, `--add-modules=jdk.incubator.vector` does not take effect in the `Spark runtime`, so I propose adding `--add-modules=jdk.incubator.vector` to the `JavaModuleOptions`(`Spark runtime options`) so that we can improve `performance` by using `hardware-accelerated BLAS operations` by default. After this patch(add `--add-modules=jdk.incubator.vector` to the `JavaModuleOptions`), as follows: https://github.com/apache/spark/assets/15246973/da7aa494-0d3c-4c60-9991-e7cd29a1cec5;> 2.`jdk.incubator.foreign` and `-Dforeign.restricted=warn` A.First introduction: https://github.com/apache/spark/pull/32253 https://github.com/apache/spark/pull/32253/files#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3e https://github.com/apache/spark/assets/15246973/3f526019-c389-4e60-ab2a-f8e99cfb;> Use `dev.ludovic.netlib:blas:1.3.2`, the class `ForeignLinkerBLAS` uses `jdk.incubator.foreign.*` in this version, so we need to add `jdk.incubator.foreign` and `-Dforeign.restricted=warn` to `SparkBuild.scala` https://github.com/apache/spark/pull/32253/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8 https://github.com/apache/spark/assets/15246973/4fd35e96-0da2-4456-a3f6-6b57ad2e9b64;> https://github.com/luhenry/netlib/blob/v1.3.2/blas/src/main/java/dev/ludovic/netlib/blas/ForeignLinkerBLAS.java#L36 https://github.com/apache/spark/assets/15246973/4b7e3bd1-4650-4c7d-bdb4-c1761d48d478;> However, with the iterative development of `dev.ludovic.netlib`, `ForeignLinkerBLAS` has experienced one `major` change, as following: https://github.com/luhenry/netlib/commit/48e923c3e5e84560139eb25b3c9df9873c05e41d https://github.com/apache/spark/assets/15246973/7ba30b19-00c7-4cc4-bea7-a6ab4b326ad8;> From now on (V3.0.0), `jdk.incubator.foreign.*` will not be used in `dev.ludovic.netlib` Currently, Spark has used the `dev.ludovic.netlib` of version `v3.0.3`. In this version, `ForeignLinkerBLAS` has be removed. https://github.com/apache/spark/blob/master/pom.xml#L191 Double check (`jdk.incubator.foreign` cannot be found in the `netlib` source code): https://github.com/apache/spark/assets/15246973/5c6c6d73-6a5d-427a-9fb4-f626f02335ca;> So we can completely remove options `jdk.incubator.foreign` and `-Dforeign.restricted=warn`. B.For JDK 21 (PS: This is to explain the historical reasons for the differences between the current code logic and the initial ones) (Just because `Spark` made changes to support `JDK 21`) https://issues.apache.org/jira/browse/SPARK-44088 https://github.com/apache/spark/assets/15246973/34e7e7e8-4e72-470e-abc0-d79406ad25e5;> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test - Pass GA. ### Was this patch authored or
(spark) branch master updated: [SPARK-47408][SQL] Fix mathExpressions that use StringType
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b623601910a3 [SPARK-47408][SQL] Fix mathExpressions that use StringType b623601910a3 is described below commit b623601910a37c863edac56d18e79a44b93c5b36 Author: Mihailo Milosevic AuthorDate: Fri Apr 26 19:48:27 2024 -0700 [SPARK-47408][SQL] Fix mathExpressions that use StringType ### What changes were proposed in this pull request? Support more functions that use strings with collations. ### Why are the changes needed? Hex, Unhex, Conv are widely used and need to be enabled wih collations ### Does this PR introduce _any_ user-facing change? Yes, enabled more functions. ### How was this patch tested? With new tests in `CollationSQLExpressionsSuite.scala`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46227 from mihailom-db/SPARK-47408. Lead-authored-by: Mihailo Milosevic Co-authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Dongjoon Hyun --- .../sql/catalyst/expressions/mathExpressions.scala | 21 ++-- .../catalyst/expressions/stringExpressions.scala | 2 +- .../spark/sql/CollationSQLExpressionsSuite.scala | 124 + 3 files changed, 138 insertions(+), 9 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala index 0c09e9be12e9..dc50c18f2ebb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala @@ -30,6 +30,7 @@ import org.apache.spark.sql.catalyst.expressions.codegen.Block._ import org.apache.spark.sql.catalyst.util.{MathUtils, NumberConverter, TypeUtils} import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryExecutionErrors} import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.types.StringTypeAnyCollation import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String @@ -450,8 +451,9 @@ case class Conv( override def first: Expression = numExpr override def second: Expression = fromBaseExpr override def third: Expression = toBaseExpr - override def inputTypes: Seq[AbstractDataType] = Seq(StringType, IntegerType, IntegerType) - override def dataType: DataType = StringType + override def inputTypes: Seq[AbstractDataType] = +Seq(StringTypeAnyCollation, IntegerType, IntegerType) + override def dataType: DataType = first.dataType override def nullable: Boolean = true override def nullSafeEval(num: Any, fromBase: Any, toBase: Any): Any = { @@ -1002,7 +1004,7 @@ case class Bin(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant with Serializable { override def inputTypes: Seq[DataType] = Seq(LongType) - override def dataType: DataType = StringType + override def dataType: DataType = SQLConf.get.defaultStringType protected override def nullSafeEval(input: Any): Any = UTF8String.fromString(jl.Long.toBinaryString(input.asInstanceOf[Long])) @@ -1108,21 +1110,24 @@ case class Hex(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant { override def inputTypes: Seq[AbstractDataType] = -Seq(TypeCollection(LongType, BinaryType, StringType)) +Seq(TypeCollection(LongType, BinaryType, StringTypeAnyCollation)) - override def dataType: DataType = StringType + override def dataType: DataType = child.dataType match { +case st: StringType => st +case _ => SQLConf.get.defaultStringType + } protected override def nullSafeEval(num: Any): Any = child.dataType match { case LongType => Hex.hex(num.asInstanceOf[Long]) case BinaryType => Hex.hex(num.asInstanceOf[Array[Byte]]) -case StringType => Hex.hex(num.asInstanceOf[UTF8String].getBytes) +case _: StringType => Hex.hex(num.asInstanceOf[UTF8String].getBytes) } override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, (c) => { val hex = Hex.getClass.getName.stripSuffix("$") s"${ev.value} = " + (child.dataType match { -case StringType => s"""$hex.hex($c.getBytes());""" +case _: StringType => s"""$hex.hex($c.getBytes());""" case _ => s"""$hex.hex($c);""" }) }) @@ -1149,7 +1154,7 @@ case class Unhex(child: Expression, failOnError: Boolean
(spark-kubernetes-operator) branch main updated: [SPARK-48015] Update `build.gradle` to fix deprecation warnings
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 167047a [SPARK-48015] Update `build.gradle` to fix deprecation warnings 167047a is described below commit 167047abed12ea8e6d709dbb3c6c326330d5787e Author: Dongjoon Hyun AuthorDate: Fri Apr 26 14:58:08 2024 -0700 [SPARK-48015] Update `build.gradle` to fix deprecation warnings ### What changes were proposed in this pull request? This PR aims to update `build.gradle` to fix deprecation warnings. ### Why are the changes needed? **AFTER** ``` $ ./gradlew build --warning-mode all > Configure project :spark-operator-api Updating PrinterColumns for generated CRD BUILD SUCCESSFUL in 331ms 16 actionable tasks: 16 up-to-date ``` **BEFORE** ``` $ ./gradlew build --warning-mode all > Configure project : Build file '/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle': line 20 The org.gradle.api.plugins.JavaPluginConvention type has been deprecated. This is scheduled to be removed in Gradle 9.0. Consult the upgrading guide for further information: https://docs.gradle.org/8.7/userguide/upgrading_version_8.html#java_convention_deprecation at build_1ab30mf3g41rlj3ezxkowdftr$_run_closure1.doCall$original(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:20) (Run with --stacktrace to get the full stack trace of this deprecation warning.) at build_1ab30mf3g41rlj3ezxkowdftr.run(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:16) (Run with --stacktrace to get the full stack trace of this deprecation warning.) Build file '/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle': line 21 The org.gradle.api.plugins.JavaPluginConvention type has been deprecated. This is scheduled to be removed in Gradle 9.0. Consult the upgrading guide for further information: https://docs.gradle.org/8.7/userguide/upgrading_version_8.html#java_convention_deprecation at build_1ab30mf3g41rlj3ezxkowdftr$_run_closure1.doCall$original(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:21) (Run with --stacktrace to get the full stack trace of this deprecation warning.) at build_1ab30mf3g41rlj3ezxkowdftr.run(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:16) (Run with --stacktrace to get the full stack trace of this deprecation warning.) Build file '/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle': line 25 The RepositoryHandler.jcenter() method has been deprecated. This is scheduled to be removed in Gradle 9.0. JFrog announced JCenter's sunset in February 2021. Use mavenCentral() instead. Consult the upgrading guide for further information: https://docs.gradle.org/8.7/userguide/upgrading_version_6.html#jcenter_deprecation at build_1ab30mf3g41rlj3ezxkowdftr$_run_closure1$_closure2.doCall$original(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:25) (Run with --stacktrace to get the full stack trace of this deprecation warning.) at build_1ab30mf3g41rlj3ezxkowdftr$_run_closure1.doCall$original(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:23) (Run with --stacktrace to get the full stack trace of this deprecation warning.) > Configure project :spark-operator-api Updating PrinterColumns for generated CRD BUILD SUCCESSFUL in 353ms 16 actionable tasks: 16 up-to-date ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually build with `--warning-mode all`. ``` $ ./gradlew build --warning-mode all > Configure project :spark-operator-api Updating PrinterColumns for generated CRD BUILD SUCCESSFUL in 331ms 16 actionable tasks: 16 up-to-date ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #9 from dongjoon-hyun/SPARK-48015. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- build.gradle | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/build.gradle b/build.gradle index ed54f7b..a6c1701 100644 --- a/build.gradle +++ b/build.gradle @@ -17,12 +17,14 @@ subprojects { apply plugin: 'idea' apply plugin: 'eclipse' apply plugin: 'java' - sourceCompatibility = 17 - targetCompatibility = 17 + + java { +sourceCompatibility = 17 +targetCompatibility = 17 + } repositories { mavenCentral() -jcenter() } apply plugin:
(spark-kubernetes-operator) branch main updated: [SPARK-47950] Add Java API Module for Spark Operator
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 28ff3e0 [SPARK-47950] Add Java API Module for Spark Operator 28ff3e0 is described below commit 28ff3e069e80bffa2a3be69fc4905ad3a0f76fd5 Author: zhou-jiang AuthorDate: Fri Apr 26 14:18:09 2024 -0700 [SPARK-47950] Add Java API Module for Spark Operator ### What changes were proposed in this pull request? This PR adds Java API library for Spark Operator, with the ability to generate yaml spec. ### Why are the changes needed? Spark Operator API refers to the CustomResourceDefinition(https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/) that represents the spec for Spark Application in k8s. This module would be used by operator controller and reconciler. It can also serve external services that access k8s server with Java library. ### Does this PR introduce _any_ user-facing change? No API changes in Apache Spark core API. Spark Operator API is proposed. To view generate SparkApplication spec yaml, use ``` ./gradlew :spark-operator-api:finalizeGeneratedCRD ``` (this requires yq to be installed for patching additional printer columns) Generated yaml file would be located at ``` spark-operator-api/build/classes/java/main/META-INF/fabric8/sparkapplications.org.apache.spark-v1.yml ``` For more details, please also refer `spark-operator-docs/spark_application.md` ### How was this patch tested? This is tested locally. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #8 from jiangzho/api. Authored-by: zhou-jiang Signed-off-by: Dongjoon Hyun --- .github/.licenserc.yaml| 1 + build.gradle | 2 + dev/.rat-excludes | 2 + gradle.properties | 16 ++ settings.gradle| 2 + spark-operator-api/build.gradle| 32 .../apache/spark/k8s/operator/BaseResource.java| 36 + .../org/apache/spark/k8s/operator/Constants.java | 82 ++ .../spark/k8s/operator/SparkApplication.java | 57 +++ .../spark/k8s/operator/SparkApplicationList.java | 26 +++ .../k8s/operator/decorators/ResourceDecorator.java | 26 +++ .../apache/spark/k8s/operator/diff/Diffable.java | 22 +++ .../spark/k8s/operator/spec/ApplicationSpec.java | 57 +++ .../operator/spec/ApplicationTimeoutConfig.java| 66 .../k8s/operator/spec/ApplicationTolerations.java | 45 ++ .../operator/spec/BaseApplicationTemplateSpec.java | 38 + .../apache/spark/k8s/operator/spec/BaseSpec.java | 36 + .../spark/k8s/operator/spec/DeploymentMode.java| 25 +++ .../spark/k8s/operator/spec/InstanceConfig.java| 68 .../k8s/operator/spec/ResourceRetainPolicy.java| 39 + .../spark/k8s/operator/spec/RestartConfig.java | 39 + .../spark/k8s/operator/spec/RestartPolicy.java | 39 + .../spark/k8s/operator/spec/RuntimeVersions.java | 40 + .../operator/status/ApplicationAttemptSummary.java | 53 ++ .../k8s/operator/status/ApplicationState.java | 50 ++ .../operator/status/ApplicationStateSummary.java | 151 + .../k8s/operator/status/ApplicationStatus.java | 170 .../spark/k8s/operator/status/AttemptInfo.java | 44 + .../k8s/operator/status/BaseAttemptSummary.java| 37 + .../spark/k8s/operator/status/BaseState.java | 37 + .../k8s/operator/status/BaseStateSummary.java | 29 .../spark/k8s/operator/status/BaseStatus.java | 64 .../spark/k8s/operator/utils/ModelUtils.java | 110 + .../src/main/resources/printer-columns.sh | 14 +- .../k8s/operator/spec/ApplicationSpecTest.java | 42 + .../spark/k8s/operator/spec/RestartPolicyTest.java | 62 +++ .../k8s/operator/status/ApplicationStatusTest.java | 178 + .../spark/k8s/operator/utils/ModelUtilsTest.java | 124 ++ 38 files changed, 1956 insertions(+), 5 deletions(-) diff --git a/.github/.licenserc.yaml b/.github/.licenserc.yaml index 26ac0c1..d1d65e2 100644 --- a/.github/.licenserc.yaml +++ b/.github/.licenserc.yaml @@ -16,5 +16,6 @@ header: - '.asf.yaml' - '**/*.gradle' - gradlew +- 'build/**' comment: on-failure diff --git a/build.gradle b/build.gradle index f64212b..ed54f7b 100644 --- a/build.gradle +++ b/build.gradle @@ -72,6 +72,8 @@ subprojects
(spark) branch master updated: [SPARK-48011][CORE] Store LogKey name as a value to avoid generating new string instances
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2b2a33cc35a8 [SPARK-48011][CORE] Store LogKey name as a value to avoid generating new string instances 2b2a33cc35a8 is described below commit 2b2a33cc35a880fafc569c707674313a56c15811 Author: Gengliang Wang AuthorDate: Fri Apr 26 13:25:15 2024 -0700 [SPARK-48011][CORE] Store LogKey name as a value to avoid generating new string instances ### What changes were proposed in this pull request? Store LogKey name as a value to avoid generating new string instances ### Why are the changes needed? To save memory usage on getting the names of `LogKey`s. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #46249 from gengliangwang/addKeyName. Authored-by: Gengliang Wang Signed-off-by: Dongjoon Hyun --- common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala | 6 +- common/utils/src/main/scala/org/apache/spark/internal/Logging.scala | 4 +--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala index 04990ddc4c9d..2ca80a496ccb 100644 --- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala +++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala @@ -16,10 +16,14 @@ */ package org.apache.spark.internal +import java.util.Locale + /** * All structured logging `keys` used in `MDC` must be extends `LogKey` */ -trait LogKey +trait LogKey { + val name: String = this.toString.toLowerCase(Locale.ROOT) +} /** * Various keys used for mapped diagnostic contexts(MDC) in logging. diff --git a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala index 085b22bee5f3..24a60f88c24a 100644 --- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala +++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala @@ -17,8 +17,6 @@ package org.apache.spark.internal -import java.util.Locale - import scala.jdk.CollectionConverters._ import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager} @@ -110,7 +108,7 @@ trait Logging { val value = if (mdc.value != null) mdc.value.toString else null sb.append(value) if (Logging.isStructuredLoggingEnabled) { - context.put(mdc.key.toString.toLowerCase(Locale.ROOT), value) + context.put(mdc.key.name, value) } if (processedParts.hasNext) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6098bd944f66 [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression 6098bd944f66 is described below commit 6098bd944f6603546601a9d5b5da5f756ce2257c Author: Nikhil Sheoran <125331115+nikhilsheoran...@users.noreply.github.com> AuthorDate: Fri Apr 26 11:23:12 2024 -0700 [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression ### What changes were proposed in this pull request? - This PR instead of calling `conf.resolver` for each call in `resolveExpression`, reuses the `resolver` obtained once. ### Why are the changes needed? - Consider a view with large number of columns (~1000s). When looking at the RuleExecutor metrics and flamegraph for a query that only does `DESCRIBE SELECT * FROM large_view`, observed that a large fraction of time is spent in `ResolveReferences` and `ResolveRelations`. Of these, the majority of the driver time went in initializing the `conf` to obtain `conf.resolver` for each of the column in the view. - Since, the same `conf` is used in each of these calls, calling the `conf.resolver` again and again can be avoided by initializing it once and reusing the same resolver. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Created a dummy view with 3000 columns. - Observed the `RuleExecutor` metrics using `RuleExecutor.dumpTimeSpent()`. - `RuleExecutor` metrics before this change (after multiple runs) ``` === Metrics of Analyzer/Optimizer Rules === Total number of runs: 1483 Total time: 8.026801698 seconds Rule Effective Time / Total Time Effective Runs / Total Runs org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 4060159342 / 4062186814 1 / 6 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 3789405037 / 3809203288 2 / 6 org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule 0 / 207411640 / 6 org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 17800584 / 19431350 1 / 6 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 15036018 / 15060440 1 / 6 org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability 0 / 149298100 / 7 ``` - `RuleExecutor` metrics after this change (after multiple runs) ``` === Metrics of Analyzer/Optimizer Rules === Total number of runs: 1483 Total time: 2.892630859 seconds Rule Effective Time / Total Time Effective Runs / Total Runs org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 1490357745 / 1492398446 1 / 6 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 1212205822 / 1241729981 2 / 6 org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule 0 / 238571610 / 6 org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 16603250 / 18806065 1 / 6 org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability 0 / 167493060 / 7 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 11158299 / 11183593 1 / 6 ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #46248 from nikhilsheoran-db/SPARK-48010. Authored-by: Nikhil Sheoran <125331115+nikhilsheoran...@users.noreply.github.com> Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/analysis/ColumnResolutionHelper.scala | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala index 6e27192ead32..c10e000a098c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala +++ b/sql/
(spark) branch master updated: [SPARK-48005][PS][CONNECT][TESTS] Enable `DefaultIndexParityTests.test_index_distributed_sequence_cleanup`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 78b19d5af08e [SPARK-48005][PS][CONNECT][TESTS] Enable `DefaultIndexParityTests.test_index_distributed_sequence_cleanup` 78b19d5af08e is described below commit 78b19d5af08ea772eaea9c13b7b984a13294 Author: Ruifeng Zheng AuthorDate: Fri Apr 26 09:58:54 2024 -0700 [SPARK-48005][PS][CONNECT][TESTS] Enable `DefaultIndexParityTests.test_index_distributed_sequence_cleanup` ### What changes were proposed in this pull request? Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup` ### Why are the changes needed? this test requires `sc` access, can be enabled in `Spark Connect with JVM` mode ### Does this PR introduce _any_ user-facing change? no, test only ### How was this patch tested? ci, also manually test: ``` python/run-tests -k --python-executables python3 --testnames 'pyspark.pandas.tests.connect.indexes.test_parity_default DefaultIndexParityTests.test_index_distributed_sequence_cleanup' Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the following Python tests: ['pyspark.pandas.tests.connect.indexes.test_parity_default DefaultIndexParityTests.test_index_distributed_sequence_cleanup'] python3 python_implementation is CPython python3 version is: Python 3.12.2 Starting test(python3): pyspark.pandas.tests.connect.indexes.test_parity_default DefaultIndexParityTests.test_index_distributed_sequence_cleanup (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/ccd3da45-f774-4f5f-8283-a91a8ee12212/python3__pyspark.pandas.tests.connect.indexes.test_parity_default_DefaultIndexParityTests.test_index_distributed_sequence_cleanup__p9yved3e.log) Finished test(python3): pyspark.pandas.tests.connect.indexes.test_parity_default DefaultIndexParityTests.test_index_distributed_sequence_cleanup (16s) Tests passed in 16 seconds ``` ### Was this patch authored or co-authored using generative AI tooling? no Closes #46242 from zhengruifeng/enable_test_index_distributed_sequence_cleanup. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- .../pyspark/pandas/tests/connect/indexes/test_parity_default.py | 3 ++- python/pyspark/pandas/tests/indexes/test_default.py | 8 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/python/pyspark/pandas/tests/connect/indexes/test_parity_default.py b/python/pyspark/pandas/tests/connect/indexes/test_parity_default.py index d6f0cadbf0cd..4240eb8fdbc8 100644 --- a/python/pyspark/pandas/tests/connect/indexes/test_parity_default.py +++ b/python/pyspark/pandas/tests/connect/indexes/test_parity_default.py @@ -19,6 +19,7 @@ import unittest from pyspark.pandas.tests.indexes.test_default import DefaultIndexTestsMixin from pyspark.testing.connectutils import ReusedConnectTestCase from pyspark.testing.pandasutils import PandasOnSparkTestUtils +from pyspark.util import is_remote_only class DefaultIndexParityTests( @@ -26,7 +27,7 @@ class DefaultIndexParityTests( PandasOnSparkTestUtils, ReusedConnectTestCase, ): -@unittest.skip("Test depends on SparkContext which is not supported from Spark Connect.") +@unittest.skipIf(is_remote_only(), "Requires JVM access") def test_index_distributed_sequence_cleanup(self): super().test_index_distributed_sequence_cleanup() diff --git a/python/pyspark/pandas/tests/indexes/test_default.py b/python/pyspark/pandas/tests/indexes/test_default.py index 3d19eb407b42..5cd9fae76dfb 100644 --- a/python/pyspark/pandas/tests/indexes/test_default.py +++ b/python/pyspark/pandas/tests/indexes/test_default.py @@ -44,7 +44,7 @@ class DefaultIndexTestsMixin: "compute.default_index_type", "distributed-sequence" ), ps.option_context("compute.ops_on_diff_frames", True): with ps.option_context("compute.default_index_cache", "LOCAL_CHECKPOINT"): -cached_rdd_ids = [rdd_id for rdd_id in self.spark._jsc.getPersistentRDDs()] +cached_rdd_ids = [rdd_id for rdd_id in self._legacy_sc._jsc.getPersistentRDDs()] psdf1 = ( self.spark.range(0, 100, 1, 10).withColumn("Key", F.col("id") % 33).pandas_api() @@ -61,13 +61,13 @@ class DefaultIndexTestsMixin: self.assertTrue( any( rdd_id not in cached_rdd_ids -for rdd_id in self.spark._jsc.getPers
(spark) branch master updated: [SPARK-48007][BUILD][TESTS] Upgrade `mssql.jdbc` to `12.6.1.jre11`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4ee528f9b29f [SPARK-48007][BUILD][TESTS] Upgrade `mssql.jdbc` to `12.6.1.jre11` 4ee528f9b29f is described below commit 4ee528f9b29f5cd52b70b27a4b8c250c8ca1a17c Author: Kent Yao AuthorDate: Fri Apr 26 08:08:57 2024 -0700 [SPARK-48007][BUILD][TESTS] Upgrade `mssql.jdbc` to `12.6.1.jre11` ### What changes were proposed in this pull request? This PR upgrades mssql.jdbc.version to 12.6.1.jre11, https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc. ### Why are the changes needed? test dependency management ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46244 from yaooqinn/SPARK-48007. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala | 3 ++- pom.xml| 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala index b351b2ad1ec7..61530f713eb8 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala @@ -28,5 +28,6 @@ class MsSQLServerDatabaseOnDocker extends DatabaseOnDocker { override val jdbcPort: Int = 1433 override def getJdbcUrl(ip: String, port: Int): String = -s"jdbc:sqlserver://$ip:$port;user=sa;password=Sapass123;" +s"jdbc:sqlserver://$ip:$port;user=sa;password=Sapass123;" + + "encrypt=true;trustServerCertificate=true" } diff --git a/pom.xml b/pom.xml index 9c8f8fbb2ab0..b916659fdbfa 100644 --- a/pom.xml +++ b/pom.xml @@ -325,7 +325,7 @@ 8.3.0 42.7.3 11.5.9.0 -9.4.1.jre8 +12.6.1.jre11 23.3.0.23.09 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47991][SQL][TEST] Arrange the test cases for window frames and window functions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ea4b7a242910 [SPARK-47991][SQL][TEST] Arrange the test cases for window frames and window functions ea4b7a242910 is described below commit ea4b7a2429106067eb30b6b47bf7c42059053d31 Author: beliefer AuthorDate: Thu Apr 25 20:54:27 2024 -0700 [SPARK-47991][SQL][TEST] Arrange the test cases for window frames and window functions ### What changes were proposed in this pull request? This PR propose to arrange the test cases for window frames and window functions. ### Why are the changes needed? Currently, `DataFrameWindowFramesSuite` and `DataFrameWindowFunctionsSuite` have different testing objectives. The comments for the above two classes are as follows: `DataFrameWindowFramesSuite` is `Window frame testing for DataFrame API.` `DataFrameWindowFunctionsSuite` is `Window function testing for DataFrame API.` But there are some test cases for window frame placed into `DataFrameWindowFunctionsSuite`. ### Does this PR introduce _any_ user-facing change? 'No'. Just arrange the test cases for window frames and window functions. ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #46226 from beliefer/SPARK-47991. Authored-by: beliefer Signed-off-by: Dongjoon Hyun --- .../spark/sql/DataFrameWindowFramesSuite.scala | 48 ++ .../spark/sql/DataFrameWindowFunctionsSuite.scala | 48 -- 2 files changed, 48 insertions(+), 48 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala index fe1393af8174..95f4cc78d156 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala @@ -32,6 +32,28 @@ import org.apache.spark.sql.types.CalendarIntervalType class DataFrameWindowFramesSuite extends QueryTest with SharedSparkSession { import testImplicits._ + test("reuse window partitionBy") { +val df = Seq((1, "1"), (2, "2"), (1, "1"), (2, "2")).toDF("key", "value") +val w = Window.partitionBy("key").orderBy("value") + +checkAnswer( + df.select( +lead("key", 1).over(w), +lead("value", 1).over(w)), + Row(1, "1") :: Row(2, "2") :: Row(null, null) :: Row(null, null) :: Nil) + } + + test("reuse window orderBy") { +val df = Seq((1, "1"), (2, "2"), (1, "1"), (2, "2")).toDF("key", "value") +val w = Window.orderBy("value").partitionBy("key") + +checkAnswer( + df.select( +lead("key", 1).over(w), +lead("value", 1).over(w)), + Row(1, "1") :: Row(2, "2") :: Row(null, null) :: Row(null, null) :: Nil) + } + test("lead/lag with empty data frame") { val df = Seq.empty[(Int, String)].toDF("key", "value") val window = Window.partitionBy($"key").orderBy($"value") @@ -570,4 +592,30 @@ class DataFrameWindowFramesSuite extends QueryTest with SharedSparkSession { } } } + + test("SPARK-34227: WindowFunctionFrame should clear its states during preparation") { +// This creates a single partition dataframe with 3 records: +// "a", 0, null +// "a", 1, "x" +// "b", 0, null +val df = spark.range(0, 3, 1, 1).select( + when($"id" < 2, lit("a")).otherwise(lit("b")).as("key"), + ($"id" % 2).cast("int").as("order"), + when($"id" % 2 === 0, lit(null)).otherwise(lit("x")).as("value")) + +val window1 = Window.partitionBy($"key").orderBy($"order") + .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) +val window2 = Window.partitionBy($"key").orderBy($"order") + .rowsBetween(Window.unboundedPreceding, Window.currentRow) +checkAnswer( + df.select( +$"key", +$"order", +nth_value($"value", 1, ignoreNulls = true).over(window1), +nth_value($"value", 1, ignoreNulls = true).over(window2)), + Seq( +Row("a", 0, "x", null), +Row("a", 1, "x"
(spark) branch master updated: [SPARK-47933][CONNECT][PYTHON][FOLLOW-UP] Avoid referencing _to_seq in `pyspark-connect`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 79357c8ccd22 [SPARK-47933][CONNECT][PYTHON][FOLLOW-UP] Avoid referencing _to_seq in `pyspark-connect` 79357c8ccd22 is described below commit 79357c8ccd22729a074c42f700544e7e3f023a8d Author: Hyukjin Kwon AuthorDate: Thu Apr 25 14:49:21 2024 -0700 [SPARK-47933][CONNECT][PYTHON][FOLLOW-UP] Avoid referencing _to_seq in `pyspark-connect` ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/46155 that removes the reference of `_to_seq` that `pyspark-connect` package does not have. ### Why are the changes needed? To recover the CI https://github.com/apache/spark/actions/runs/8821919392/job/24218893631 ### Does this PR introduce _any_ user-facing change? No, the main change has not been released out yet. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46229 from HyukjinKwon/SPARK-47933-followuptmp. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/group.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/group.py b/python/pyspark/sql/group.py index d26e23bc7160..34c3531c8302 100644 --- a/python/pyspark/sql/group.py +++ b/python/pyspark/sql/group.py @@ -43,9 +43,9 @@ def dfapi(f: Callable[..., DataFrame]) -> Callable[..., DataFrame]: def df_varargs_api(f: Callable[..., DataFrame]) -> Callable[..., DataFrame]: -from pyspark.sql.classic.column import _to_seq - def _api(self: "GroupedData", *cols: str) -> DataFrame: +from pyspark.sql.classic.column import _to_seq + name = f.__name__ jdf = getattr(self._jgd, name)(_to_seq(self.session._sc, cols)) return DataFrame(jdf, self.session) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-45425][DOCS][FOLLOWUP] Add a migration guide for TINYINT type mapping change
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e1d021214c61 [SPARK-45425][DOCS][FOLLOWUP] Add a migration guide for TINYINT type mapping change e1d021214c61 is described below commit e1d021214c6130588e69dfa05e0391d89b463f9d Author: Kent Yao AuthorDate: Thu Apr 25 08:19:40 2024 -0700 [SPARK-45425][DOCS][FOLLOWUP] Add a migration guide for TINYINT type mapping change ### What changes were proposed in this pull request? Followup of SPARK-45425, adding migration guide. ### Why are the changes needed? migration guide ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing build ### Was this patch authored or co-authored using generative AI tooling? no Closes #46224 from yaooqinn/SPARK-45425. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- docs/sql-migration-guide.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 9b189eee6ad1..024423fb145a 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -47,6 +47,7 @@ license: | - Since Spark 4.0, MySQL JDBC datasource will read BIT(n > 1) as BinaryType, while in Spark 3.5 and previous, read as LongType. To restore the previous behavior, set `spark.sql.legacy.mysql.bitArrayMapping.enabled` to `true`. - Since Spark 4.0, MySQL JDBC datasource will write ShortType as SMALLINT, while in Spark 3.5 and previous, write as INTEGER. To restore the previous behavior, you can replace the column with IntegerType whenever before writing. - Since Spark 4.0, Oracle JDBC datasource will write TimestampType as TIMESTAMP WITH LOCAL TIME ZONE, while in Spark 3.5 and previous, write as TIMESTAMP. To restore the previous behavior, set `spark.sql.legacy.oracle.timestampMapping.enabled` to `true`. +- Since Spark 4.0, MsSQL Server JDBC datasource will read TINYINT as ShortType, while in Spark 3.5 and previous, read as IntegerType. To restore the previous behavior, set `spark.sql.legacy.mssqlserver.numericMapping.enabled` to `true`. - Since Spark 4.0, The default value for `spark.sql.legacy.ctePrecedencePolicy` has been changed from `EXCEPTION` to `CORRECTED`. Instead of raising an error, inner CTE definitions take precedence over outer definitions. - Since Spark 4.0, The default value for `spark.sql.legacy.timeParserPolicy` has been changed from `EXCEPTION` to `CORRECTED`. Instead of raising an `INCONSISTENT_BEHAVIOR_CROSS_VERSION` error, `CANNOT_PARSE_TIMESTAMP` will be raised if ANSI mode is enable. `NULL` will be returned if ANSI mode is disabled. See [Datetime Patterns for Formatting and Parsing](sql-ref-datetime-pattern.html). - Since Spark 4.0, A bug falsely allowing `!` instead of `NOT` when `!` is not a prefix operator has been fixed. Clauses such as `expr ! IN (...)`, `expr ! BETWEEN ...`, or `col ! NULL` now raise syntax errors. To restore the previous behavior, set `spark.sql.legacy.bangEqualsNot` to `true`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (de5c512e0179 -> 287d02073929)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from de5c512e0179 [SPARK-47987][PYTHON][CONNECT][TESTS] Enable `ArrowParityTests.test_createDataFrame_empty_partition` add 287d02073929 [SPARK-47989][SQL] MsSQLServer: Fix the scope of spark.sql.legacy.mssqlserver.numericMapping.enabled No new revisions were added by this update. Summary of changes: .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 177 +++-- .../org/apache/spark/sql/internal/SQLConf.scala| 2 +- .../apache/spark/sql/jdbc/MsSqlServerDialect.scala | 29 ++-- 3 files changed, 104 insertions(+), 104 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47987][PYTHON][CONNECT][TESTS] Enable `ArrowParityTests.test_createDataFrame_empty_partition`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new de5c512e0179 [SPARK-47987][PYTHON][CONNECT][TESTS] Enable `ArrowParityTests.test_createDataFrame_empty_partition` de5c512e0179 is described below commit de5c512e017965b5c726e254f8969fb17d5c17ea Author: Ruifeng Zheng AuthorDate: Thu Apr 25 08:16:56 2024 -0700 [SPARK-47987][PYTHON][CONNECT][TESTS] Enable `ArrowParityTests.test_createDataFrame_empty_partition` ### What changes were proposed in this pull request? Reenable `ArrowParityTests.test_createDataFrame_empty_partition` We actually already had set up Classic SparkContext `_legacy_sc ` for Spark Connect test, so only need to add `_legacy_sc` in Classic PySpark test. ### Why are the changes needed? to improve test coverage ### Does this PR introduce _any_ user-facing change? no, test only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #46220 from zhengruifeng/enable_test_createDataFrame_empty_partition. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/connect/test_parity_arrow.py | 4 python/pyspark/sql/tests/test_arrow.py| 4 +++- python/pyspark/testing/sqlutils.py| 1 + 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/python/pyspark/sql/tests/connect/test_parity_arrow.py b/python/pyspark/sql/tests/connect/test_parity_arrow.py index 93d0b6cf0f5f..8727cc279641 100644 --- a/python/pyspark/sql/tests/connect/test_parity_arrow.py +++ b/python/pyspark/sql/tests/connect/test_parity_arrow.py @@ -24,10 +24,6 @@ from pyspark.testing.pandasutils import PandasOnSparkTestUtils class ArrowParityTests(ArrowTestsMixin, ReusedConnectTestCase, PandasOnSparkTestUtils): -@unittest.skip("Spark Connect does not support Spark Context but the test depends on that.") -def test_createDataFrame_empty_partition(self): -super().test_createDataFrame_empty_partition() - @unittest.skip("Spark Connect does not support fallback.") def test_createDataFrame_fallback_disabled(self): super().test_createDataFrame_fallback_disabled() diff --git a/python/pyspark/sql/tests/test_arrow.py b/python/pyspark/sql/tests/test_arrow.py index 5235e021bae9..03cb35feb994 100644 --- a/python/pyspark/sql/tests/test_arrow.py +++ b/python/pyspark/sql/tests/test_arrow.py @@ -56,6 +56,7 @@ from pyspark.testing.sqlutils import ( ExamplePointUDT, ) from pyspark.errors import ArithmeticException, PySparkTypeError, UnsupportedOperationException +from pyspark.util import is_remote_only if have_pandas: import pandas as pd @@ -830,7 +831,8 @@ class ArrowTestsMixin: pdf = pd.DataFrame({"c1": [1], "c2": ["string"]}) df = self.spark.createDataFrame(pdf) self.assertEqual([Row(c1=1, c2="string")], df.collect()) -self.assertGreater(self.spark.sparkContext.defaultParallelism, len(pdf)) +if not is_remote_only(): +self.assertGreater(self._legacy_sc.defaultParallelism, len(pdf)) def test_toPandas_error(self): for arrow_enabled in [True, False]: diff --git a/python/pyspark/testing/sqlutils.py b/python/pyspark/testing/sqlutils.py index 690d5c37b22e..a0fdada72972 100644 --- a/python/pyspark/testing/sqlutils.py +++ b/python/pyspark/testing/sqlutils.py @@ -258,6 +258,7 @@ class ReusedSQLTestCase(ReusedPySparkTestCase, SQLTestUtils, PySparkErrorTestUti @classmethod def setUpClass(cls): super(ReusedSQLTestCase, cls).setUpClass() +cls._legacy_sc = cls.sc cls.spark = SparkSession(cls.sc) cls.tempdir = tempfile.NamedTemporaryFile(delete=False) os.unlink(cls.tempdir.name) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47990][BUILD] Upgrade `zstd-jni` to 1.5.6-3
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5810554ce0fa [SPARK-47990][BUILD] Upgrade `zstd-jni` to 1.5.6-3 5810554ce0fa is described below commit 5810554ce0faba4cb8e7f3ca3dd5812bd2cf179f Author: panbingkun AuthorDate: Thu Apr 25 08:10:04 2024 -0700 [SPARK-47990][BUILD] Upgrade `zstd-jni` to 1.5.6-3 ### What changes were proposed in this pull request? The pr aims to upgrade `zstd-jni` from `1.5.6-2` to `1.5.6-3`. ### Why are the changes needed? 1.This version fix a potential memory leak problem, as follows: https://github.com/apache/spark/assets/15246973/eeae3e7f-0c44-443d-838b-fa39b9e45d64;> 2.https://github.com/luben/zstd-jni/compare/v1.5.6-2...v1.5.6-3 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46225 from panbingkun/SPARK-47990. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index f6adb6d18b85..005cc7bfb435 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -278,4 +278,4 @@ xz/1.9//xz-1.9.jar zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar zookeeper-jute/3.9.2//zookeeper-jute-3.9.2.jar zookeeper/3.9.2//zookeeper-3.9.2.jar -zstd-jni/1.5.6-2//zstd-jni-1.5.6-2.jar +zstd-jni/1.5.6-3//zstd-jni-1.5.6-3.jar diff --git a/pom.xml b/pom.xml index c98514efa356..9c8f8fbb2ab0 100644 --- a/pom.xml +++ b/pom.xml @@ -800,7 +800,7 @@ com.github.luben zstd-jni -1.5.6-2 +1.5.6-3 com.clearspring.analytics - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47979][SQL][TESTS] Use Hive tables explicitly for Hive table capability tests
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0fcced63be99 [SPARK-47979][SQL][TESTS] Use Hive tables explicitly for Hive table capability tests 0fcced63be99 is described below commit 0fcced63be99302593591d29370c00e7c0d73cec Author: Dongjoon Hyun AuthorDate: Wed Apr 24 18:57:29 2024 -0700 [SPARK-47979][SQL][TESTS] Use Hive tables explicitly for Hive table capability tests ### What changes were proposed in this pull request? This PR aims to use `Hive` tables explicitly for Hive table capability tests in `hive` and `hive-thriftserver` module. ### Why are the changes needed? To make Hive test coverage robust by making it independent from Apache Spark configuration changes. ### Does this PR introduce _any_ user-facing change? No, this is a test only change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46211 from dongjoon-hyun/SPARK-47979. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala | 2 +- .../scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala | 1 + .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala | 9 +++-- .../org/apache/spark/sql/hive/execution/HiveQuerySuite.scala | 6 +++--- .../spark/sql/hive/execution/command/ShowCreateTableSuite.scala | 4 5 files changed, 12 insertions(+), 10 deletions(-) diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala index b552611b75d1..2b2cbec41d64 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala @@ -108,7 +108,7 @@ class UISeleniumSuite val baseURL = s"http://$localhost:$uiPort; val queries = Seq( -"CREATE TABLE test_map(key INT, value STRING)", +"CREATE TABLE test_map (key INT, value STRING) USING HIVE", s"LOAD DATA LOCAL INPATH '${TestData.smallKv}' OVERWRITE INTO TABLE test_map") queries.foreach(statement.execute) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala index 0bc288501a01..b60adfb6f4cf 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala @@ -686,6 +686,7 @@ class HiveClientSuite(version: String) extends HiveVersionSuite(version) { versionSpark.sql( s""" |CREATE TABLE tab(c1 string) + |USING HIVE |location '${tmpDir.toURI.toString}' """.stripMargin) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala index 241fdd4b9ec5..965db22b78f1 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala @@ -216,7 +216,7 @@ class HiveDDLSuite test("SPARK-22431: alter table tests with nested types") { withTable("t1", "t2", "t3") { - spark.sql("CREATE TABLE t1 (q STRUCT, i1 INT)") + spark.sql("CREATE TABLE t1 (q STRUCT, i1 INT) USING HIVE") spark.sql("ALTER TABLE t1 ADD COLUMNS (newcol1 STRUCT<`col1`:STRING, col2:Int>)") val newcol = spark.sql("SELECT * FROM t1").schema.fields(2).name assert("newcol1".equals(newcol)) @@ -2614,7 +2614,7 @@ class HiveDDLSuite "msg" -> "java.lang.UnsupportedOperationException: Unknown field type: void") ) - sql("CREATE TABLE t3 AS SELECT NULL AS null_col") + sql("CREATE TABLE t3 USING HIVE AS SELECT NULL AS null_col") checkAnswer(sql("SELECT * FROM t3"), Row(null)) } @@ -2642,9 +2642,6 @@ class HiveDDLSuite sql("CREATE TABLE t3 (v VOID) USING hive") checkAnswer(sql("SELECT * FROM t3"), Seq.empty) - - sql("CREATE TABLE t4 (v VOID)") - checkAnswer(sql("SELECT * FROM t4"), Seq.empty) } // Create table with void t
(spark) branch branch-3.5 updated: [SPARK-47633][SQL][3.5] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new ce19bfc10682 [SPARK-47633][SQL][3.5] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization ce19bfc10682 is described below commit ce19bfc1068229897454c5f5cb78aeb435821bd2 Author: Bruce Robbins AuthorDate: Wed Apr 24 09:48:21 2024 -0700 [SPARK-47633][SQL][3.5] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization This is a backport of #45763 to branch-3.5. ### What changes were proposed in this pull request? Modify `LateralJoin` to include right-side plan output in `allAttributes`. ### Why are the changes needed? In the following example, the view v1 is cached, but a query of v1 does not use the cache: ``` CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2); CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2); create or replace temp view v1 as select * from t1 join lateral ( select c1 as a, c2 as b from t2) on c1 = a; cache table v1; explain select * from v1; == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false :- LocalTableScan [c1#180, c2#181] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [plan_id=113] +- LocalTableScan [a#173, b#174] ``` The canonicalized version of the `LateralJoin` node is not consistent when there is a join condition. For example, for the above query, the join condition is canonicalized as follows: ``` Before canonicalization: Some((c1#174 = a#167)) After canonicalization: Some((none#0 = none#167)) ``` You can see that the `exprId` for the second operand of `EqualTo` is not normalized (it remains 167). That's because the attribute `a` from the right-side plan is not included `allAttributes`. This PR adds right-side attributes to `allAttributes` so that references to right-side attributes in the join condition are normalized during canonicalization. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46190 from bersprockets/lj_canonical_issue_35. Authored-by: Bruce Robbins Signed-off-by: Dongjoon Hyun --- .../plans/logical/basicLogicalOperators.scala | 2 ++ .../scala/org/apache/spark/sql/CachedTableSuite.scala | 19 +++ 2 files changed, 21 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index 58c03ee72d6d..ca2c6a850561 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -2017,6 +2017,8 @@ case class LateralJoin( joinType: JoinType, condition: Option[Expression]) extends UnaryNode { + override lazy val allAttributes: AttributeSeq = left.output ++ right.plan.output + require(Seq(Inner, LeftOuter, Cross).contains(joinType), s"Unsupported lateral join type $joinType") diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala index 8331a3c10fc9..9815cb816c99 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala @@ -1710,4 +1710,23 @@ class CachedTableSuite extends QueryTest with SQLTestUtils } } } + + test("SPARK-47633: Cache hit for lateral join with join condition") { +withTempView("t", "q1") { + sql("create or replace temp view t(c1, c2) as values (0, 1), (1, 2)") + val query = """select * +|from t +|join lateral ( +| select c1 as a, c2 as b +| from t) +|on c1 = a; +|""".stripMargin + sql(s"cache table q1 as $query") + val df = sql(query) + checkAnswer(df, +Row(0, 1, 0, 1) :: Row(1, 2, 1, 2) :: Nil) + assert(getNumInMemoryRelations(df) == 1) +} + + } } - To
(spark) branch master updated (09ed09cb18e7 -> 03d4ea6a707c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 09ed09cb18e7 [SPARK-47958][TESTS] Change LocalSchedulerBackend to notify scheduler of executor on start add 03d4ea6a707c [SPARK-47974][BUILD] Remove `install_scala` from `build/mvn` No new revisions were added by this update. Summary of changes: .github/workflows/benchmark.yml| 6 ++ .github/workflows/build_and_test.yml | 24 .github/workflows/build_python_connect.yml | 3 +-- .github/workflows/maven_test.yml | 3 +-- build/mvn | 24 5 files changed, 12 insertions(+), 48 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47969][PYTHON][TESTS] Make `test_creation_index` deterministic
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cb1e1f5cd49a [SPARK-47969][PYTHON][TESTS] Make `test_creation_index` deterministic cb1e1f5cd49a is described below commit cb1e1f5cd49a612c0c081949759c1f931883c263 Author: Ruifeng Zheng AuthorDate: Tue Apr 23 23:09:10 2024 -0700 [SPARK-47969][PYTHON][TESTS] Make `test_creation_index` deterministic ### What changes were proposed in this pull request? Make `test_creation_index` deterministic ### Why are the changes needed? it may fail in some env ``` FAIL [16.261s]: test_creation_index (pyspark.pandas.tests.frame.test_constructor.FrameConstructorTests.test_creation_index) -- Traceback (most recent call last): File "/home/jenkins/python/pyspark/testing/pandasutils.py", line 91, in _assert_pandas_equal assert_frame_equal( File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 1257, in assert_frame_equal assert_index_equal( File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 407, in assert_index_equal raise_assert_detail(obj, msg, left, right) File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 665, in raise_assert_detail raise AssertionError(msg) AssertionError: DataFrame.index are different DataFrame.index values are different (40.0 %) [left]: Int64Index([2, 3, 4, 6, 5], dtype='int64') [right]: Int64Index([2, 3, 4, 5, 6], dtype='int64') ``` ### Does this PR introduce _any_ user-facing change? no. test only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #46200 from zhengruifeng/fix_test_creation_index. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- python/pyspark/pandas/tests/frame/test_constructor.py | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/python/pyspark/pandas/tests/frame/test_constructor.py b/python/pyspark/pandas/tests/frame/test_constructor.py index ee010d8f023d..d7581895c6c9 100644 --- a/python/pyspark/pandas/tests/frame/test_constructor.py +++ b/python/pyspark/pandas/tests/frame/test_constructor.py @@ -195,14 +195,14 @@ class FrameConstructorMixin: with ps.option_context("compute.ops_on_diff_frames", True): # test with ps.DataFrame and pd.Index self.assert_eq( -ps.DataFrame(data=psdf, index=pd.Index([2, 3, 4, 5, 6])), -pd.DataFrame(data=pdf, index=pd.Index([2, 3, 4, 5, 6])), +ps.DataFrame(data=psdf, index=pd.Index([2, 3, 4, 5, 6])).sort_index(), +pd.DataFrame(data=pdf, index=pd.Index([2, 3, 4, 5, 6])).sort_index(), ) # test with ps.DataFrame and ps.Index self.assert_eq( -ps.DataFrame(data=psdf, index=ps.Index([2, 3, 4, 5, 6])), -pd.DataFrame(data=pdf, index=pd.Index([2, 3, 4, 5, 6])), +ps.DataFrame(data=psdf, index=ps.Index([2, 3, 4, 5, 6])).sort_index(), +pd.DataFrame(data=pdf, index=pd.Index([2, 3, 4, 5, 6])).sort_index(), ) # test String Index - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47956][SQL] Sanity check for unresolved LCA reference
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 66613ba042c4 [SPARK-47956][SQL] Sanity check for unresolved LCA reference 66613ba042c4 is described below commit 66613ba042c4b73b45b3c71e79ce05c225f527e7 Author: Wenchen Fan AuthorDate: Tue Apr 23 08:44:48 2024 -0700 [SPARK-47956][SQL] Sanity check for unresolved LCA reference ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/40558. The sanity check should apply to all plan nodes, not only Project/Aggregate/Window, as we don't know what bug can happen. Maybe the bug moves LCA references to other plan nodes. ### Why are the changes needed? better error message when bug happens ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #46185 from cloud-fan/small. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/analysis/CheckAnalysis.scala | 20 ++-- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 10bff5e6e59a..d1b336b08955 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -110,9 +110,8 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB } /** Check and throw exception when a given resolved plan contains LateralColumnAliasReference. */ - private def checkNotContainingLCA(exprSeq: Seq[NamedExpression], plan: LogicalPlan): Unit = { -if (!plan.resolved) return - exprSeq.foreach(_.transformDownWithPruning(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) { + private def checkNotContainingLCA(exprs: Seq[Expression], plan: LogicalPlan): Unit = { + exprs.foreach(_.transformDownWithPruning(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) { case lcaRef: LateralColumnAliasReference => throw SparkException.internalError("Resolved plan should not contain any " + s"LateralColumnAliasReference.\nDebugging information: plan:\n$plan", @@ -789,17 +788,10 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB msg = s"Found the unresolved operator: ${o.simpleString(SQLConf.get.maxToStringFields)}", context = o.origin.getQueryContext, summary = o.origin.context.summary) - // If the plan is resolved, the resolved Project, Aggregate or Window should have restored or - // resolved all lateral column alias references. Add check for extra safe. - case p @ Project(pList, _) -if pList.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) => -checkNotContainingLCA(pList, p) - case agg @ Aggregate(_, aggList, _) -if aggList.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) => -checkNotContainingLCA(aggList, agg) - case w @ Window(pList, _, _, _) -if pList.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) => -checkNotContainingLCA(pList, w) + // If the plan is resolved, all lateral column alias references should have been either + // restored or resolved. Add check for extra safe. + case o if o.expressions.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) => +checkNotContainingLCA(o.expressions, o) case _ => } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47948][PYTHON] Upgrade the minimum `Pandas` version to 2.0.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2b01755f2791 [SPARK-47948][PYTHON] Upgrade the minimum `Pandas` version to 2.0.0 2b01755f2791 is described below commit 2b01755f27917b1d391835e6f8b1b2f9a34cc832 Author: Haejoon Lee AuthorDate: Tue Apr 23 07:49:15 2024 -0700 [SPARK-47948][PYTHON] Upgrade the minimum `Pandas` version to 2.0.0 ### What changes were proposed in this pull request? This PR proposes to bump Pandas version up to 2.0.0. ### Why are the changes needed? From Apache Spark 4.0.0, Pandas API on Spark supports Pandas 2.0.0 and above and some of features will be broken from Pandas 1.x, so installing Pandas 2.x is required. See the full list of breaking changes from [Upgrading from PySpark 3.5 to 4.0](https://github.com/apache/spark/blob/master/python/docs/source/migration_guide/pyspark_upgrade.rst#upgrading-from-pyspark-35-to-40). ### Does this PR introduce _any_ user-facing change? No API changes, but the minimum Pandas version from user-facing documentation will be changed. ### How was this patch tested? The existing CI should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46175 from itholic/bump_pandas_2. Authored-by: Haejoon Lee Signed-off-by: Dongjoon Hyun --- dev/create-release/spark-rm/Dockerfile | 2 +- python/docs/source/getting_started/install.rst | 6 +++--- python/docs/source/migration_guide/pyspark_upgrade.rst | 3 +-- python/docs/source/user_guide/sql/arrow_pandas.rst | 2 +- python/packaging/classic/setup.py | 2 +- python/packaging/connect/setup.py | 2 +- python/pyspark/sql/pandas/utils.py | 2 +- 7 files changed, 9 insertions(+), 10 deletions(-) diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile index f51b24d58394..8d5ca38ba88e 100644 --- a/dev/create-release/spark-rm/Dockerfile +++ b/dev/create-release/spark-rm/Dockerfile @@ -37,7 +37,7 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true # These arguments are just for reuse and not really meant to be customized. ARG APT_INSTALL="apt-get install --no-install-recommends -y" -ARG PIP_PKGS="sphinx==4.5.0 mkdocs==1.1.2 numpy==1.20.3 pydata_sphinx_theme==0.13.3 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 jinja2==3.1.2 twine==3.4.1 sphinx-plotly-directive==0.1.3 sphinx-copybutton==0.5.2 pandas==1.5.3 pyarrow==10.0.1 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.62.0 protobuf==4.21.6 grpcio-status==1.62.0 googleapis-common-protos==1.56.4" +ARG PIP_PKGS="sphinx==4.5.0 mkdocs==1.1.2 numpy==1.20.3 pydata_sphinx_theme==0.13.3 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 jinja2==3.1.2 twine==3.4.1 sphinx-plotly-directive==0.1.3 sphinx-copybutton==0.5.2 pandas==2.0.3 pyarrow==10.0.1 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.62.0 protobuf==4.21.6 grpcio-status==1.62.0 googleapis-common-protos==1.56.4" ARG GEM_PKGS="bundler:2.3.8" # Install extra needed repos and refresh. diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index 08b6cc813cba..33a0560764df 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -205,7 +205,7 @@ Installable with ``pip install "pyspark[connect]"``. == = == PackageSupported version Note == = == -`pandas` >=1.4.4 Required for Spark Connect +`pandas` >=2.0.0 Required for Spark Connect `pyarrow` >=10.0.0 Required for Spark Connect `grpcio` >=1.62.0 Required for Spark Connect `grpcio-status`>=1.62.0 Required for Spark Connect @@ -220,7 +220,7 @@ Installable with ``pip install "pyspark[sql]"``. = = == Package Supported version Note = = == -`pandas` >=1.4.4 Required for Spark SQL +`pandas` >=2.0.0 Required for Spark SQL `pyarrow` >=10.0.0 Required for Spark SQL = = == @@ -233,7 +233,7 @@ Installable with ``pip install "pyspark[pandas_on_spark]"``. = = Package Supported version Note = =
(spark) branch master updated (cf5fc0c720ee -> 9c4f12ca04ac)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cf5fc0c720ee [MINOR][DOCS] Fix type hint of 3 functions add 9c4f12ca04ac [SPARK-47949][SQL][DOCKER][TESTS] MsSQLServer: Bump up mssql docker image version to 2022-CU12-GDR1-ubuntu-22.04 No new revisions were added by this update. Summary of changes: ...OnDocker.scala => MsSQLServerDatabaseOnDocker.scala} | 13 +++-- .../spark/sql/jdbc/MsSqlServerIntegrationSuite.scala| 14 +- .../spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 16 ++-- .../spark/sql/jdbc/v2/MsSqlServerNamespaceSuite.scala | 17 ++--- 4 files changed, 12 insertions(+), 48 deletions(-) copy connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/{MySQLDatabaseOnDocker.scala => MsSQLServerDatabaseOnDocker.scala} (72%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][DOCS] Fix type hint of 3 functions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cf5fc0c720ee [MINOR][DOCS] Fix type hint of 3 functions cf5fc0c720ee is described below commit cf5fc0c720eef01c5fe86a6ce05160adbdbf4678 Author: Ruifeng Zheng AuthorDate: Tue Apr 23 07:42:44 2024 -0700 [MINOR][DOCS] Fix type hint of 3 functions ### What changes were proposed in this pull request? Fix type hint of 3 functions I did a quick scan of the functions, don't find other similar places. ### Why are the changes needed? a string input will be treated as literal instead of column name ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #46179 from zhengruifeng/correct_con. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/connect/functions/builtin.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/python/pyspark/sql/connect/functions/builtin.py b/python/pyspark/sql/connect/functions/builtin.py index 519e53c3a13f..8fffb1831466 100644 --- a/python/pyspark/sql/connect/functions/builtin.py +++ b/python/pyspark/sql/connect/functions/builtin.py @@ -2141,7 +2141,7 @@ def sequence( sequence.__doc__ = pysparkfuncs.sequence.__doc__ -def schema_of_csv(csv: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Column: +def schema_of_csv(csv: Union[str, Column], options: Optional[Dict[str, str]] = None) -> Column: if isinstance(csv, Column): _csv = csv elif isinstance(csv, str): @@ -2161,7 +2161,7 @@ def schema_of_csv(csv: "ColumnOrName", options: Optional[Dict[str, str]] = None) schema_of_csv.__doc__ = pysparkfuncs.schema_of_csv.__doc__ -def schema_of_json(json: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Column: +def schema_of_json(json: Union[str, Column], options: Optional[Dict[str, str]] = None) -> Column: if isinstance(json, Column): _json = json elif isinstance(json, str): @@ -2181,7 +2181,7 @@ def schema_of_json(json: "ColumnOrName", options: Optional[Dict[str, str]] = Non schema_of_json.__doc__ = pysparkfuncs.schema_of_json.__doc__ -def schema_of_xml(xml: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Column: +def schema_of_xml(xml: Union[str, Column], options: Optional[Dict[str, str]] = None) -> Column: if isinstance(xml, Column): _xml = xml elif isinstance(xml, str): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (ca916258b991 -> 33fa77cb4868)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ca916258b991 [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server add 33fa77cb4868 [MINOR][DOCS] Add `docs/_generated/` to .gitignore No new revisions were added by this update. Summary of changes: .gitignore | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ca916258b991 [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server ca916258b991 is described below commit ca916258b9916452aa2f377608e6be8df65550e5 Author: Kent Yao AuthorDate: Tue Apr 23 07:41:04 2024 -0700 [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server ### What changes were proposed in this pull request? This PR adds Document Mapping Spark SQL Data Types to Microsoft SQL Server ### Why are the changes needed? doc improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? doc build ![image](https://github.com/apache/spark/assets/8326978/7220d96a-c5ca-4780-9fc5-f93c99f91c10) ### Was this patch authored or co-authored using generative AI tooling? no Closes #46177 from yaooqinn/SPARK-47953. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- docs/sql-data-sources-jdbc.md | 106 ++ 1 file changed, 106 insertions(+) diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md index 51c0886430a3..734ed43f912a 100644 --- a/docs/sql-data-sources-jdbc.md +++ b/docs/sql-data-sources-jdbc.md @@ -1630,3 +1630,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to Microsoft SQL Server + +The below table describes the data type conversions from Spark SQL Data Types to Microsoft SQL Server data types, +when creating, altering, or writing data to a Microsoft SQL Server table using the built-in jdbc data source with +the mssql-jdbc as the activated JDBC Driver. + + + + + Spark SQL Data Type + SQL Server Data Type + Remarks + + + + + BooleanType + bit + + + + ByteType + smallint + Supported since Spark 4.0.0, previous versions throw errors + + + ShortType + smallint + + + + IntegerType + int + + + + LongType + bigint + + + + FloatType + real + + + + DoubleType + double precision + + + + DecimalType(p, s) + number(p,s) + + + + DateType + date + + + + TimestampType + datetime + + + + TimestampNTZType + datetime + + + + StringType + nvarchar(max) + + + + BinaryType + varbinary(max) + + + + CharType(n) + char(n) + + + + VarcharType(n) + varchar(n) + + + + + +The Spark Catalyst data types below are not supported with suitable SQL Server types. + +- DayTimeIntervalType +- YearMonthIntervalType +- CalendarIntervalType +- ArrayType +- MapType +- StructType +- UserDefinedType +- NullType +- ObjectType +- VariantType - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-47943] Add `GitHub Action` CI for Java Build and Test
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 4a5febd [SPARK-47943] Add `GitHub Action` CI for Java Build and Test 4a5febd is described below commit 4a5febd8f48716c0506738fc6a5fd58afb95779f Author: zhou-jiang AuthorDate: Mon Apr 22 22:44:17 2024 -0700 [SPARK-47943] Add `GitHub Action` CI for Java Build and Test ### What changes were proposed in this pull request? This PR adds an additional CI build task for operator. ### Why are the changes needed? The additional CI task is needed in order to build and test Java code for upcoming operator pull requests. When Java plugin is enabled and Java source is checked in, `./gradlew build` [task](https://docs.gradle.org/3.3/userguide/java_plugin.html#sec:java_tasks) by default includes a set of tasks to compile and run tests. This can serve as pull request build. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? tested locally. ### Was this patch authored or co-authored using generative AI tooling? no Closes #7 from jiangzho/ci. Authored-by: zhou-jiang Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 6a5a147..887119f 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -26,4 +26,20 @@ jobs: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} with: config: .github/.licenserc.yaml - + build-test: +name: "Build Test CI" +runs-on: ubuntu-latest +strategy: + matrix: +java-version: [ 17, 21 ] +steps: + - name: Checkout repository +uses: actions/checkout@v3 + - name: Set up JDK ${{ matrix.java-version }} +uses: actions/setup-java@v2 +with: + java-version: ${{ matrix.java-version }} + distribution: 'adopt' + - name: Build with Gradle +run: | + set -o pipefail; ./gradlew build; set +o pipefail - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-47929] Setup Static Analysis for Operator
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 798ca15 [SPARK-47929] Setup Static Analysis for Operator 798ca15 is described below commit 798ca15844c71baf5d7f1f8842e461a73c1009a9 Author: zhou-jiang AuthorDate: Mon Apr 22 22:42:23 2024 -0700 [SPARK-47929] Setup Static Analysis for Operator ### What changes were proposed in this pull request? This is a breakdown PR from #2 - setting up common build Java tasks and corresponding plugins. ### Why are the changes needed? This PR includes checkstyle, pmd, spotbugs. Also includes jacoco for coverage analysis, spotless for formatting. These tasks can help to enhance the quality of future Java contributions. They can also be referred in CI tasks for automation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested manually. ### Was this patch authored or co-authored using generative AI tooling? no Closes #6 from jiangzho/builder_task. Authored-by: zhou-jiang Signed-off-by: Dongjoon Hyun --- build.gradle | 76 - config/checkstyle/checkstyle.xml | 208 +++ config/pmd/ruleset.xml | 33 ++ config/spotbugs/spotbugs_exclude.xml | 25 + gradle.properties| 22 5 files changed, 362 insertions(+), 2 deletions(-) diff --git a/build.gradle b/build.gradle index 6732f5a..f64212b 100644 --- a/build.gradle +++ b/build.gradle @@ -1,3 +1,18 @@ +buildscript { + repositories { +maven { + url = uri("https://plugins.gradle.org/m2/;) +} + } + dependencies { +classpath "com.github.spotbugs.snom:spotbugs-gradle-plugin:${spotBugsGradlePluginVersion}" +classpath "com.diffplug.spotless:spotless-plugin-gradle:${spotlessPluginVersion}" + } +} + +assert JavaVersion.current().isCompatibleWith(JavaVersion.VERSION_17): "Java 17 or newer is " + +"required" + subprojects { apply plugin: 'idea' apply plugin: 'eclipse' @@ -6,7 +21,64 @@ subprojects { targetCompatibility = 17 repositories { - mavenCentral() - jcenter() +mavenCentral() +jcenter() + } + + apply plugin: 'checkstyle' + checkstyle { +toolVersion = checkstyleVersion +configFile = file("$rootDir/config/checkstyle/checkstyle.xml") +ignoreFailures = false +showViolations = true + } + + apply plugin: 'pmd' + pmd { +ruleSets = ["java-basic", "java-braces"] +ruleSetFiles = files("$rootDir/config/pmd/ruleset.xml") +toolVersion = pmdVersion +consoleOutput = true +ignoreFailures = false + } + + apply plugin: 'com.github.spotbugs' + spotbugs { +toolVersion = spotBugsVersion +afterEvaluate { + reportsDir = file("${project.reporting.baseDir}/findbugs") +} +excludeFilter = file("$rootDir/config/spotbugs/spotbugs_exclude.xml") +ignoreFailures = false + } + + apply plugin: 'jacoco' + jacoco { +toolVersion = jacocoVersion + } + jacocoTestReport { +dependsOn test + } + + apply plugin: 'com.diffplug.spotless' + spotless { +java { + endWithNewline() + googleJavaFormat('1.17.0') + importOrder( +'java', +'javax', +'scala', +'', +'org.apache.spark', + ) + trimTrailingWhitespace() + removeUnusedImports() +} +format 'misc', { + target '*.md', '*.gradle', '**/*.properties', '**/*.xml', '**/*.yaml', '**/*.yml' + endWithNewline() + trimTrailingWhitespace() +} } } diff --git a/config/checkstyle/checkstyle.xml b/config/checkstyle/checkstyle.xml new file mode 100644 index 000..90161fe --- /dev/null +++ b/config/checkstyle/checkstyle.xml @@ -0,0 +1,208 @@ + + +https://checkstyle.org/dtds/configuration_1_3.dtd;> + + + + + + + + + + + + + + + + + + + + + + + + +ftp://"/> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
(spark) branch master updated (9d715ba49171 -> 876c2cf34a35)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9d715ba49171 [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error add 876c2cf34a35 [SPARK-44170][BUILD][FOLLOWUP] Align JUnit5 dependency's version and clean up exclusions No new revisions were added by this update. Summary of changes: pom.xml | 69 +++-- 1 file changed, 41 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9d715ba49171 [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error 9d715ba49171 is described below commit 9d715ba491710969340d9e8a49a21d11f51ef7d3 Author: Kent Yao AuthorDate: Mon Apr 22 22:31:13 2024 -0700 [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error ### What changes were proposed in this pull request? This PR uses SMALLINT (as TINYINT ranges [0, 255]) instead of BYTE to fix the ByteType mapping for MsSQLServer JDBC ```java [info] com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or variable #1: Cannot find data type BYTE. [info] at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:265) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1662) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:898) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:793) [info] at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7417) [info] at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3488) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:262) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:237) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeUpdate(SQLServerStatement.java:733) [info] at org.apache.spark.sql.jdbc.JdbcDialect.createTable(JdbcDialects.scala:267) ``` ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46164 from yaooqinn/SPARK-47938. Lead-authored-by: Kent Yao Co-authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala | 8 .../main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala | 1 + 2 files changed, 9 insertions(+) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala index 8bceb9506e85..273e8c35dd07 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala @@ -437,4 +437,12 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite { .load() assert(df.collect().toSet === expectedResult) } + + test("SPARK-47938: Fix 'Cannot find data type BYTE' in SQL Server") { +spark.sql("select cast(1 as byte) as c0") + .write + .jdbc(jdbcUrl, "test_byte", new Properties) +val df = spark.read.jdbc(jdbcUrl, "test_byte", new Properties) +checkAnswer(df, Row(1.toShort)) + } } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala index 862e99adc3b0..1d05c0d7c24e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala @@ -136,6 +136,7 @@ private case class MsSqlServerDialect() extends JdbcDialect { case BinaryType => Some(JdbcType("VARBINARY(MAX)", java.sql.Types.VARBINARY)) case ShortType if !SQLConf.get.legacyMsSqlServerNumericMappingEnabled => Some(JdbcType("SMALLINT", java.sql.Types.SMALLINT)) +case ByteType => Some(JdbcType("SMALLINT", java.sql.Types.TINYINT)) case _ => None } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (e4fb7dd98219 -> a97e72cfa7d4)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from e4fb7dd98219 [MINOR] Remove unnecessary `imports` add a97e72cfa7d4 [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/functions/builtin.py | 8 +--- python/pyspark/sql/functions/builtin.py | 12 +++- 2 files changed, 12 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (b335dd366fb1 -> e4fb7dd98219)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b335dd366fb1 [SPARK-47909][CONNECT][PYTHON][TESTS][FOLLOW-UP] Move `pyspark.classic` references add e4fb7dd98219 [MINOR] Remove unnecessary `imports` No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/util/Distribution.scala| 2 -- .../scala/org/apache/spark/input/WholeTextFileInputFormatSuite.scala| 2 -- .../scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala | 2 -- sql/api/src/main/scala/org/apache/spark/sql/types/UpCastRule.scala | 2 -- .../src/main/scala/org/apache/spark/sql/execution/CacheManager.scala| 2 -- .../scala/org/apache/spark/sql/CollationRegexpExpressionsSuite.scala| 2 -- .../scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala| 2 -- sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala | 1 - .../test/scala/org/apache/spark/sql/hive/client/HiveClientSuites.scala | 2 -- .../org/apache/spark/sql/hive/client/HiveClientUserNameSuites.scala | 2 -- .../scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala | 2 -- .../org/apache/spark/sql/hive/client/HivePartitionFilteringSuites.scala | 2 -- 12 files changed, 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47904][SQL][3.5] Preserve case in Avro schema when using enableStableIdentifiersForUnionType
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new d7c3794a0c56 [SPARK-47904][SQL][3.5] Preserve case in Avro schema when using enableStableIdentifiersForUnionType d7c3794a0c56 is described below commit d7c3794a0c567b12e8c8e18132aa362f11acdf5f Author: Ivan Sadikov AuthorDate: Mon Apr 22 15:36:13 2024 -0700 [SPARK-47904][SQL][3.5] Preserve case in Avro schema when using enableStableIdentifiersForUnionType ### What changes were proposed in this pull request? Backport of https://github.com/apache/spark/pull/46126 to branch-3.5. When `enableStableIdentifiersForUnionType` is enabled, all of the types are lowercased which creates a problem when field types are case-sensitive: Union type with fields: ``` Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava), Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new Schema.Field("F", Schema.create(Type.FLOAT))).asJava) ``` would become ``` struct> ``` but instead should be ``` struct> ``` ### Why are the changes needed? Fixes a bug of lowercasing the field name (the type portion). ### Does this PR introduce _any_ user-facing change? Yes, if a user enables `enableStableIdentifiersForUnionType` and has Union types, all fields will preserve the case. Previously, the field names would be all in lowercase. ### How was this patch tested? I added a test case to verify the new field names. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46169 from sadikovi/SPARK-47904-3.5. Authored-by: Ivan Sadikov Signed-off-by: Dongjoon Hyun --- .../apache/spark/sql/avro/SchemaConverters.scala | 10 +++ .../org/apache/spark/sql/avro/AvroSuite.scala | 31 -- 2 files changed, 34 insertions(+), 7 deletions(-) diff --git a/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala b/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala index 06abe977e3b0..af358a8d1c96 100644 --- a/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala +++ b/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala @@ -183,14 +183,14 @@ object SchemaConverters { // Avro's field name may be case sensitive, so field names for two named type // could be "a" and "A" and we need to distinguish them. In this case, we throw // an exception. - val temp_name = s"member_${s.getName.toLowerCase(Locale.ROOT)}" - if (fieldNameSet.contains(temp_name)) { + // Stable id prefix can be empty so the name of the field can be just the type. + val tempFieldName = s"member_${s.getName}" + if (!fieldNameSet.add(tempFieldName.toLowerCase(Locale.ROOT))) { throw new IncompatibleSchemaException( - "Cannot generate stable indentifier for Avro union type due to name " + + "Cannot generate stable identifier for Avro union type due to name " + s"conflict of type name ${s.getName}") } - fieldNameSet.add(temp_name) - temp_name + tempFieldName } else { s"member$i" } diff --git a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala index 1df99210a55a..01c9dfb57a19 100644 --- a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala +++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala @@ -370,7 +370,7 @@ abstract class AvroSuite "", Seq()) } - assert(e.getMessage.contains("Cannot generate stable indentifier")) + assert(e.getMessage.contains("Cannot generate stable identifier")) } { val e = intercept[Exception] { @@ -381,7 +381,7 @@ abstract class AvroSuite "", Seq()) } - assert(e.getMessage.contains("Cannot generate stable indentifier")) + assert(e.getMessage.contains("Cannot generate stable identifier")) } // Two array types or two map types are not allowed in union. { @@ -434,6 +434,33 @@ abstract class AvroSuite } } + tes
(spark) branch master updated: [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ac9a12ef6e06 [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support ac9a12ef6e06 is described below commit ac9a12ef6e062ae07e878e202521b22de9979a17 Author: Dongjoon Hyun AuthorDate: Mon Apr 22 14:46:03 2024 -0700 [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support ### What changes were proposed in this pull request? This PR aims to update K8s docs to recommend K8s v1.27+ for Apache Spark 4.0.0. This is a kind of follow-up of the following previous PR because Apache Spark 4.0.0 schedule is delayed slightly. - #43069 ### Why are the changes needed? **1. K8s community starts to release v1.30.0 from 2024-04-17.** - https://kubernetes.io/releases/#release-v1-30 **2. Default K8s Version in Public Cloud environments** The default K8s versions of public cloud providers are already K8s 1.27+. - EKS: v1.29 (Default) - GKE: v1.29 (Rapid), v1.28 (Regular), v1.27 (Stable) - AKS: v1.27 **3. End Of Support** In addition, K8s 1.26 is going to reach EOL when Apache Spark 4.0.0 arrives because K8s 1.26 is also going to reach EOL on June. | K8s | AKS | GKE | EKS | | | --- | --- | --- | | 1.26 | 2024-03 | 2024-06 | 2024-06 | - [AKS EOL Schedule](https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar) - [GKE EOL Schedule](https://cloud.google.com/kubernetes-engine/docs/release-schedule) - [EKS EOL Schedule](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar) ### Does this PR introduce _any_ user-facing change? - No, this is a documentation-only change about K8s versions. - Apache Spark K8s Integration Test is currently using K8s v1.30.0 on Minikube already. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46168 from dongjoon-hyun/SPARK-47942. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- docs/running-on-kubernetes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 778af5f0751a..606b5eb6f900 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -44,7 +44,7 @@ Cluster administrators should use [Pod Security Policies](https://kubernetes.io/ # Prerequisites -* A running Kubernetes cluster at version >= 1.26 with access configured to it using +* A running Kubernetes cluster at version >= 1.27 with access configured to it using [kubectl](https://kubernetes.io/docs/reference/kubectl/). If you do not already have a working Kubernetes cluster, you may set up a test cluster on your local machine using [minikube](https://kubernetes.io/docs/getting-started-guides/minikube/). - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (f2d0cf23018f -> fc0c8553ea05)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f2d0cf23018f [SPARK-47907][SQL] Put bang under a config add fc0c8553ea05 [SPARK-47904][SQL] Preserve case in Avro schema when using enableStableIdentifiersForUnionType No new revisions were added by this update. Summary of changes: .../apache/spark/sql/avro/SchemaConverters.scala | 8 +++--- .../org/apache/spark/sql/avro/AvroSuite.scala | 31 -- 2 files changed, 32 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 86563169eef8 [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to `33.1.0-jre` in Docker IT 86563169eef8 is described below commit 86563169eef899040e1ec70dd9963c64311dbaa1 Author: Cheng Pan AuthorDate: Mon Apr 22 13:34:20 2024 -0700 [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to `33.1.0-jre` in Docker IT ### What changes were proposed in this pull request? This PR aims to upgrade `guava` dependency to `33.1.0-jre` in Docker Integration tests. ### Why are the changes needed? This is a preparation of the following PR. - #45372 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46167 from dongjoon-hyun/SPARK-47940. Authored-by: Cheng Pan Signed-off-by: Dongjoon Hyun --- connector/docker-integration-tests/pom.xml | 2 +- project/SparkBuild.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/connector/docker-integration-tests/pom.xml b/connector/docker-integration-tests/pom.xml index bb7647c72491..9003c2190be2 100644 --- a/connector/docker-integration-tests/pom.xml +++ b/connector/docker-integration-tests/pom.xml @@ -39,7 +39,7 @@ com.google.guava guava - 33.0.0-jre + 33.1.0-jre test diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index bcaa51ec30ff..1bcc9c893393 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -952,7 +952,7 @@ object Unsafe { object DockerIntegrationTests { // This serves to override the override specified in DependencyOverrides: lazy val settings = Seq( -dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre" +dependencyOverrides += "com.google.guava" % "guava" % "33.1.0-jre" ) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (256fc51508e4 -> 676d47ffe091)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 256fc51508e4 [SPARK-47411][SQL] Support StringInstr & FindInSet functions to work with collated strings add 676d47ffe091 [SPARK-47935][INFRA][PYTHON] Pin `pandas==2.0.3` for `pypy3.8` No new revisions were added by this update. Summary of changes: dev/infra/Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47930][BUILD] Upgrade RoaringBitmap to 1.0.6
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2fb31dea1c53 [SPARK-47930][BUILD] Upgrade RoaringBitmap to 1.0.6 2fb31dea1c53 is described below commit 2fb31dea1c53352a8101bb0ec91f46c7d7ff826e Author: panbingkun AuthorDate: Mon Apr 22 00:44:32 2024 -0700 [SPARK-47930][BUILD] Upgrade RoaringBitmap to 1.0.6 ### What changes were proposed in this pull request? The pr aims to upgrade `RoaringBitmap` from `1.0.5` to `1.0.6`. ### Why are the changes needed? The full release notes: https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.0.6 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46152 from panbingkun/SPARK-47930. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 8 core/benchmarks/MapStatusesConvertBenchmark-results.txt | 8 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt index 502d10c1c58c..607efde07d1e 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500708715 8 0.0 707870326.0 1.0X -Num Maps: 5 Fetch partitions:1000 1610 1623 12 0.0 1610312472.0 0.4X -Num Maps: 5 Fetch partitions:1500 2443 2461 23 0.0 2442675908.0 0.3X +Num Maps: 5 Fetch partitions:500686690 4 0.0 686489113.0 1.0X +Num Maps: 5 Fetch partitions:1000 1701 1727 24 0.0 1700658689.0 0.4X +Num Maps: 5 Fetch partitions:1500 2750 2760 13 0.0 2749746755.0 0.2X diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-results.txt index 9fe4175bb5d9..3efec12b2cb3 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500775778 5 0.0 774980756.0 1.0X -Num Maps: 5 Fetch partitions:1000 1765 1765 1 0.0 1765011999.0 0.4X -Num Maps: 5 Fetch partitions:1500 2671 2682 15 0.0 2671372452.0 0.3X +Num Maps: 5 Fetch partitions:500736746 12 0.0 736390304.0 1.0X +Num Maps: 5 Fetch partitions:1000 1615 1632 16 0.0 1615129364.0 0.5X +Num Maps: 5 Fetch partitions:1500 2574 2589 14 0.0 2573656222.0 0.3X diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 6420c9df4d16..c1adff73d339 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -1,7 +1,7 @@ HikariCP/2.5.1//HikariCP-2.5.1.jar JLargeArrays/1.5//JLargeArrays-1.5.jar
(spark) branch master updated: [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new adf02d38061b [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` adf02d38061b is described below commit adf02d38061bd0ef48fd07252bef7706a0e49757 Author: Dongjoon Hyun AuthorDate: Fri Apr 19 20:04:13 2024 -0700 [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` ### What changes were proposed in this pull request? This PR aims to mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` to run in a different test pipeline. ### Why are the changes needed? This will move this test case from `sql - other tests` to `sql - extended tests` to rebalance test pipelines. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46145 from dongjoon-hyun/SPARK-47925. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala index 4edb51d27190..9b39a2295e7d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala @@ -26,10 +26,12 @@ import org.apache.spark.sql.execution.aggregate.BaseAggregateExec import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSparkSession import org.apache.spark.sql.types.LongType +import org.apache.spark.tags.ExtendedSQLTest /** * Query tests for the Bloom filter aggregate and filter function. */ +@ExtendedSQLTest class BloomFilterAggregateQuerySuite extends QueryTest with SharedSparkSession { import testImplicits._ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3fcc0f7ac142 [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 3fcc0f7ac142 is described below commit 3fcc0f7ac142756b38f66085543ca045abe76a9f Author: Dongjoon Hyun AuthorDate: Fri Apr 19 19:58:15 2024 -0700 [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 ### What changes were proposed in this pull request? This PR aims to upgrade the minimum version of `arrow` R package to 10.0.0 like PySpark. ### Why are the changes needed? Apache Spark `master` branch tests only with the latest R package which is `15.0.1` as of now. To avoid any incompatibility issues across R and Python, we had better use the same minimum policy. ``` $ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8755911327 R -e 'installed.packages()' | grep arrow | head -n1 arrow"arrow""/usr/local/lib/R/site-library" "15.0.1" ``` ### Does this PR introduce _any_ user-facing change? Yes, but most SparkR users has been using the latest one which is higher than 10.0.0 because `Arrow R package 10.0.0` was released 2022-10-26 and has been used over one and half years. - https://cran.r-project.org/src/contrib/Archive/arrow/ ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46142 from dongjoon-hyun/SPARK-47923. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- R/pkg/DESCRIPTION | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 2523104268d3..f7dd261c10fd 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -21,7 +21,7 @@ Suggests: testthat, e1071, survival, -arrow (>= 1.0.0) +arrow (>= 10.0.0) Collate: 'schema.R' 'generics.R' - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (3fcc0f7ac142 -> 2613516110a4)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3fcc0f7ac142 [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 add 2613516110a4 [SPARK-47924][CORE] Add a DEBUG log to `DiskStore.moveFileToBlock` No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/storage/DiskStore.scala | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated (afd99d19a2b8 -> 6a358ff7d633)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git from afd99d19a2b8 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12 add 6a358ff7d633 [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated No new revisions were added by this update. Summary of changes: .../org/apache/spark/api/python/WriteInputFormatTestDataGenerator.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated (bcaf61b975d6 -> e7a2e5a196a8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from bcaf61b975d6 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12 add e7a2e5a196a8 [SPARK-47828][CONNECT][PYTHON][3.4] DataFrameWriterV2.overwrite fails with invalid plan No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/plan.py | 8 python/pyspark/sql/tests/test_readwriter.py | 7 ++- 2 files changed, 10 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47915][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8aa8ad6be7b3 [SPARK-47915][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.1 8aa8ad6be7b3 is described below commit 8aa8ad6be7b3eeceafa2ad1e9211fb8133bb675c Author: Bjørn Jørgensen AuthorDate: Fri Apr 19 08:20:17 2024 -0700 [SPARK-47915][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.1 ### What changes were proposed in this pull request? Upgrade `kubernetes-client` from 6.12.0 to 6.12.1 ### Why are the changes needed? [Release notes](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.12.1) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46137 from bjornjorgensen/kub-client6.12.1. Authored-by: Bjørn Jørgensen Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +-- pom.xml | 2 +- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 770a7522e9f7..6420c9df4d16 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -156,31 +156,31 @@ jsr305/3.0.0//jsr305-3.0.0.jar jta/1.1//jta-1.1.jar jul-to-slf4j/2.0.13//jul-to-slf4j-2.0.13.jar kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar -kubernetes-client-api/6.12.0//kubernetes-client-api-6.12.0.jar -kubernetes-client/6.12.0//kubernetes-client-6.12.0.jar -kubernetes-httpclient-okhttp/6.12.0//kubernetes-httpclient-okhttp-6.12.0.jar -kubernetes-model-admissionregistration/6.12.0//kubernetes-model-admissionregistration-6.12.0.jar -kubernetes-model-apiextensions/6.12.0//kubernetes-model-apiextensions-6.12.0.jar -kubernetes-model-apps/6.12.0//kubernetes-model-apps-6.12.0.jar -kubernetes-model-autoscaling/6.12.0//kubernetes-model-autoscaling-6.12.0.jar -kubernetes-model-batch/6.12.0//kubernetes-model-batch-6.12.0.jar -kubernetes-model-certificates/6.12.0//kubernetes-model-certificates-6.12.0.jar -kubernetes-model-common/6.12.0//kubernetes-model-common-6.12.0.jar -kubernetes-model-coordination/6.12.0//kubernetes-model-coordination-6.12.0.jar -kubernetes-model-core/6.12.0//kubernetes-model-core-6.12.0.jar -kubernetes-model-discovery/6.12.0//kubernetes-model-discovery-6.12.0.jar -kubernetes-model-events/6.12.0//kubernetes-model-events-6.12.0.jar -kubernetes-model-extensions/6.12.0//kubernetes-model-extensions-6.12.0.jar -kubernetes-model-flowcontrol/6.12.0//kubernetes-model-flowcontrol-6.12.0.jar -kubernetes-model-gatewayapi/6.12.0//kubernetes-model-gatewayapi-6.12.0.jar -kubernetes-model-metrics/6.12.0//kubernetes-model-metrics-6.12.0.jar -kubernetes-model-networking/6.12.0//kubernetes-model-networking-6.12.0.jar -kubernetes-model-node/6.12.0//kubernetes-model-node-6.12.0.jar -kubernetes-model-policy/6.12.0//kubernetes-model-policy-6.12.0.jar -kubernetes-model-rbac/6.12.0//kubernetes-model-rbac-6.12.0.jar -kubernetes-model-resource/6.12.0//kubernetes-model-resource-6.12.0.jar -kubernetes-model-scheduling/6.12.0//kubernetes-model-scheduling-6.12.0.jar -kubernetes-model-storageclass/6.12.0//kubernetes-model-storageclass-6.12.0.jar +kubernetes-client-api/6.12.1//kubernetes-client-api-6.12.1.jar +kubernetes-client/6.12.1//kubernetes-client-6.12.1.jar +kubernetes-httpclient-okhttp/6.12.1//kubernetes-httpclient-okhttp-6.12.1.jar +kubernetes-model-admissionregistration/6.12.1//kubernetes-model-admissionregistration-6.12.1.jar +kubernetes-model-apiextensions/6.12.1//kubernetes-model-apiextensions-6.12.1.jar +kubernetes-model-apps/6.12.1//kubernetes-model-apps-6.12.1.jar +kubernetes-model-autoscaling/6.12.1//kubernetes-model-autoscaling-6.12.1.jar +kubernetes-model-batch/6.12.1//kubernetes-model-batch-6.12.1.jar +kubernetes-model-certificates/6.12.1//kubernetes-model-certificates-6.12.1.jar +kubernetes-model-common/6.12.1//kubernetes-model-common-6.12.1.jar +kubernetes-model-coordination/6.12.1//kubernetes-model-coordination-6.12.1.jar +kubernetes-model-core/6.12.1//kubernetes-model-core-6.12.1.jar +kubernetes-model-discovery/6.12.1//kubernetes-model-discovery-6.12.1.jar +kubernetes-model-events/6.12.1//kubernetes-model-events-6.12.1.jar +kubernetes-model-extensions/6.12.1//kubernetes-model-extensions-6.12.1.jar +kubernetes-model-flowcontrol/6.12.1//kubernetes-model-flowcontrol-6.12.1.jar +kubernetes-model-gatewayapi/6.12.1//kubernetes-model-gatewayapi-6.12.1.jar +kubernetes-model-metrics/6.12.1//kubernetes-model-metrics-6.12.1.jar +kubernetes-model-networking/6.12.1//kubernetes-model-networking-6.12.1.jar +kubernetes-model-node/6.12.1//kubernetes-model-node-6.12.1.jar +kubernetes
(spark) branch master updated: [SPARK-47898][SQL] Port HIVE-12270: Add DBTokenStore support to HS2 delegation token
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 074ddc282567 [SPARK-47898][SQL] Port HIVE-12270: Add DBTokenStore support to HS2 delegation token 074ddc282567 is described below commit 074ddc2825674edcea1bb7febf2c6d8b27c2e375 Author: Kent Yao AuthorDate: Thu Apr 18 10:23:11 2024 -0700 [SPARK-47898][SQL] Port HIVE-12270: Add DBTokenStore support to HS2 delegation token ### What changes were proposed in this pull request? This PR ports `HIVE-12270: Add DBTokenStore support to HS2 delegation token`. This is a partial, as tests and other diffs that are already in the upstream artifacts are not necessary. ### Why are the changes needed? This PR can reduce the usage of HMS classes in spark-thriftserver, a small step for reducing blocker for upgrading builtin Hive ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Pass build ### Was this patch authored or co-authored using generative AI tooling? no Closes #46115 from yaooqinn/SPARK-47898. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../java/org/apache/hive/service/auth/HiveAuthFactory.java| 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java index e3316cef241c..c48f4e3ec7b0 100644 --- a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java +++ b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java @@ -27,9 +27,7 @@ import javax.security.sasl.Sasl; import org.apache.hadoop.hive.conf.HiveConf; import org.apache.hadoop.hive.conf.HiveConf.ConfVars; -import org.apache.hadoop.hive.metastore.HiveMetaStore; -import org.apache.hadoop.hive.metastore.HiveMetaStore.HMSHandler; -import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.ql.metadata.Hive; import org.apache.hadoop.hive.shims.HadoopShims.KerberosNameShim; import org.apache.hadoop.hive.shims.ShimLoader; import org.apache.hadoop.hive.thrift.DBTokenStore; @@ -132,16 +130,15 @@ public class HiveAuthFactory { HiveConf.ConfVars.METASTORE_CLUSTER_DELEGATION_TOKEN_STORE_CLS); if (tokenStoreClass.equals(DBTokenStore.class.getName())) { -HMSHandler baseHandler = new HiveMetaStore.HMSHandler( -"new db based metaserver", conf, true); -rawStore = baseHandler.getMS(); +// Follows https://issues.apache.org/jira/browse/HIVE-12270 +rawStore = Hive.class; } delegationTokenManager.startDelegationTokenSecretManager( conf, rawStore, ServerMode.HIVESERVER2); saslServer.setSecretManager(delegationTokenManager.getSecretManager()); } -catch (MetaException|IOException e) { +catch (IOException e) { throw new TTransportException("Failed to start token manager", e); } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][TESTS] Replace CONFIG_DIM1 with CONFIG_DIM2 in timestamp tests
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9f9cc87c1a19 [MINOR][TESTS] Replace CONFIG_DIM1 with CONFIG_DIM2 in timestamp tests 9f9cc87c1a19 is described below commit 9f9cc87c1a19f01b65840cfdbec831867277ee59 Author: Kent Yao AuthorDate: Thu Apr 18 10:20:51 2024 -0700 [MINOR][TESTS] Replace CONFIG_DIM1 with CONFIG_DIM2 in timestamp tests ### What changes were proposed in this pull request? A followup of #33640, it looks like the test purpose has 2 different dimensions ### Why are the changes needed? test fix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46119 from yaooqinn/minor. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql | 2 +- sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql b/sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql index 377b26c67a3e..28fe4539855c 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql @@ -1,6 +1,6 @@ -- timestamp_ltz literals and constructors --CONFIG_DIM1 spark.sql.timestampType=TIMESTAMP_LTZ ---CONFIG_DIM1 spark.sql.timestampType=TIMESTAMP_NTZ +--CONFIG_DIM2 spark.sql.timestampType=TIMESTAMP_NTZ select timestamp_ltz'2016-12-31 00:12:00', timestamp_ltz'2016-12-31'; diff --git a/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql b/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql index d744c0c19b42..07901093cfba 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql @@ -1,6 +1,6 @@ -- timestamp_ntz literals and constructors --CONFIG_DIM1 spark.sql.timestampType=TIMESTAMP_LTZ ---CONFIG_DIM1 spark.sql.timestampType=TIMESTAMP_NTZ +--CONFIG_DIM2 spark.sql.timestampType=TIMESTAMP_NTZ select timestamp_ntz'2016-12-31 00:12:00', timestamp_ntz'2016-12-31'; - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-47889][FOLLOWUP] Add `gradlew` to `.licenserc.yaml`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new fe6e74e [SPARK-47889][FOLLOWUP] Add `gradlew` to `.licenserc.yaml` fe6e74e is described below commit fe6e74ee9005f6b2a275fd92583713ebca3159a5 Author: Dongjoon Hyun AuthorDate: Thu Apr 18 09:34:14 2024 -0700 [SPARK-47889][FOLLOWUP] Add `gradlew` to `.licenserc.yaml` ### What changes were proposed in this pull request? This PR aims to add `gradlew` to `.licenserc.yaml` as a follow-up of - #4 ### Why are the changes needed? To recover CI. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No, Closes #5 from dongjoon-hyun/SPARK-47889. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .github/.licenserc.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/.licenserc.yaml b/.github/.licenserc.yaml index f00689f..26ac0c1 100644 --- a/.github/.licenserc.yaml +++ b/.github/.licenserc.yaml @@ -15,5 +15,6 @@ header: - 'NOTICE' - '.asf.yaml' - '**/*.gradle' +- gradlew comment: on-failure - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-47889] Setup gradle as build tool for operator repository
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 5f2c89c [SPARK-47889] Setup gradle as build tool for operator repository 5f2c89c is described below commit 5f2c89cea4aa04c4439a6de651ea4cfcead95015 Author: zhou-jiang AuthorDate: Thu Apr 18 09:27:50 2024 -0700 [SPARK-47889] Setup gradle as build tool for operator repository This is a breakdown from #2 : set up [gradle](https://gradle.org/) as the build-tool for operator Closes #4 from jiangzho/gradle. Authored-by: zhou-jiang Signed-off-by: Dongjoon Hyun --- .github/.licenserc.yaml | 1 + .gitignore | 10 ++ LICENSE | 10 ++ build.gradle | 12 ++ gradle/wrapper/gradle-wrapper.properties | 24 +++ gradlew | 253 +++ 6 files changed, 310 insertions(+) diff --git a/.github/.licenserc.yaml b/.github/.licenserc.yaml index e9d1245..f00689f 100644 --- a/.github/.licenserc.yaml +++ b/.github/.licenserc.yaml @@ -14,5 +14,6 @@ header: - 'LICENSE' - 'NOTICE' - '.asf.yaml' +- '**/*.gradle' comment: on-failure diff --git a/.gitignore b/.gitignore index 78213f8..5e0e9b6 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,13 @@ .vscode /lib/ target/ + +# Gradle Files # + +.gradle +.m2 +.out/ +build +dependencies.lock +**/dependencies.lock +gradle/wrapper/gradle-wrapper.jar diff --git a/LICENSE b/LICENSE index 261eeb9..bde9e98 100644 --- a/LICENSE +++ b/LICENSE @@ -199,3 +199,13 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. + + + +This product includes a gradle wrapper. + +* gradlew and gradle/wrapper/gradle-wrapper.properties + +Copyright: 2015-2021 Gradle Authors. +Home page: https://github.com/gradle/gradle +License: https://www.apache.org/licenses/LICENSE-2.0 diff --git a/build.gradle b/build.gradle new file mode 100644 index 000..6732f5a --- /dev/null +++ b/build.gradle @@ -0,0 +1,12 @@ +subprojects { + apply plugin: 'idea' + apply plugin: 'eclipse' + apply plugin: 'java' + sourceCompatibility = 17 + targetCompatibility = 17 + + repositories { + mavenCentral() + jcenter() + } +} diff --git a/gradle/wrapper/gradle-wrapper.properties b/gradle/wrapper/gradle-wrapper.properties new file mode 100644 index 000..9c87f96 --- /dev/null +++ b/gradle/wrapper/gradle-wrapper.properties @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +distributionBase=GRADLE_USER_HOME +distributionPath=wrapper/dists +distributionSha256Sum=194717442575a6f96e1c1befa2c30e9a4fc90f701d7aee33eb879b79e7ff05c0 +distributionUrl=https\://services.gradle.org/distributions/gradle-8.7-all.zip +networkTimeout=1 +zipStoreBase=GRADLE_USER_HOME +zipStorePath=wrapper/dists diff --git a/gradlew b/gradlew new file mode 100755 index 000..369a55f --- /dev/null +++ b/gradlew @@ -0,0 +1,253 @@ +#!/bin/sh + +# +# Copyright © 2015-2021 the original authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +## +# +# Gradle start up script for POSIX generated by Gradle. +# +# Important for running: +# +# (1) You need a
svn commit: r68631 - /release/spark/spark-3.4.2/
Author: dongjoon Date: Thu Apr 18 15:12:12 2024 New Revision: 68631 Log: Remove Apache Spark 3.4.2 after releasing 3.4.3 Removed: release/spark/spark-3.4.2/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47887][CONNECT] Remove unused import `spark/connect/common.proto` from `spark/connect/relations.proto`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6232085227ee [SPARK-47887][CONNECT] Remove unused import `spark/connect/common.proto` from `spark/connect/relations.proto` 6232085227ee is described below commit 6232085227ee2cc4e831996a1ac84c27868a1595 Author: yangjie01 AuthorDate: Thu Apr 18 07:50:00 2024 -0700 [SPARK-47887][CONNECT] Remove unused import `spark/connect/common.proto` from `spark/connect/relations.proto` ### What changes were proposed in this pull request? SPARK-46812 | [https://github.com/apache/spark/pull/45232](https://github.com/apache/spark/pull/45232/files#diff-5b26ee7d224ae355b252d713e570cb03eaecbf7f8adcdb6287dc40c370b71462R26) added an unused import `spark/connect/common.proto` to `spark/connect/relations.proto`, this pr just remove it. ### Why are the changes needed? Fix compilation warning: ``` spark/connect/relations.proto:26:1: warning: Import spark/connect/common.proto is unused. ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46106 from LuciferYang/SPARK-47887. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- .../main/protobuf/spark/connect/relations.proto| 1 - python/pyspark/sql/connect/proto/relations_pb2.py | 303 ++--- 2 files changed, 151 insertions(+), 153 deletions(-) diff --git a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto index 5cbe6459d226..3882b2e85396 100644 --- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto +++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto @@ -23,7 +23,6 @@ import "google/protobuf/any.proto"; import "spark/connect/expressions.proto"; import "spark/connect/types.proto"; import "spark/connect/catalog.proto"; -import "spark/connect/common.proto"; option java_multiple_files = true; option java_package = "org.apache.spark.connect.proto"; diff --git a/python/pyspark/sql/connect/proto/relations_pb2.py b/python/pyspark/sql/connect/proto/relations_pb2.py index 467d0610bbc6..5bf3901ee545 100644 --- a/python/pyspark/sql/connect/proto/relations_pb2.py +++ b/python/pyspark/sql/connect/proto/relations_pb2.py @@ -32,11 +32,10 @@ from google.protobuf import any_pb2 as google_dot_protobuf_dot_any__pb2 from pyspark.sql.connect.proto import expressions_pb2 as spark_dot_connect_dot_expressions__pb2 from pyspark.sql.connect.proto import types_pb2 as spark_dot_connect_dot_types__pb2 from pyspark.sql.connect.proto import catalog_pb2 as spark_dot_connect_dot_catalog__pb2 -from pyspark.sql.connect.proto import common_pb2 as spark_dot_connect_dot_common__pb2 DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile( - b'\n\x1dspark/connect/relations.proto\x12\rspark.connect\x1a\x19google/protobuf/any.proto\x1a\x1fspark/connect/expressions.proto\x1a\x19spark/connect/types.proto\x1a\x1bspark/connect/catalog.proto\x1a\x1aspark/connect/common.proto"\xe9\x1a\n\x08Relation\x12\x35\n\x06\x63ommon\x18\x01 \x01(\x0b\x32\x1d.spark.connect.RelationCommonR\x06\x63ommon\x12)\n\x04read\x18\x02 \x01(\x0b\x32\x13.spark.connect.ReadH\x00R\x04read\x12\x32\n\x07project\x18\x03 \x01(\x0b\x32\x16.spark.connect.Project [...] + b'\n\x1dspark/connect/relations.proto\x12\rspark.connect\x1a\x19google/protobuf/any.proto\x1a\x1fspark/connect/expressions.proto\x1a\x19spark/connect/types.proto\x1a\x1bspark/connect/catalog.proto"\xe9\x1a\n\x08Relation\x12\x35\n\x06\x63ommon\x18\x01 \x01(\x0b\x32\x1d.spark.connect.RelationCommonR\x06\x63ommon\x12)\n\x04read\x18\x02 \x01(\x0b\x32\x13.spark.connect.ReadH\x00R\x04read\x12\x32\n\x07project\x18\x03 \x01(\x0b\x32\x16.spark.connect.ProjectH\x00R\x07project\x12/\n\x06\x66il [...] ) _builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals()) @@ -66,154 +65,154 @@ if _descriptor._USE_C_DESCRIPTORS == False: _WITHCOLUMNSRENAMED.fields_by_name["rename_columns_map"]._serialized_options = b"\030\001" _PARSE_OPTIONSENTRY._options = None _PARSE_OPTIONSENTRY._serialized_options = b"8\001" -_RELATION._serialized_start = 193 -_RELATION._serialized_end = 3626 -_UNKNOWN._serialized_start = 3628 -_UNKNOWN._serialized_end = 3637 -_RELATIONCOMMON._serialized_start = 3639 -_RELATIONCOMMON._serialized_end = 3730 -_SQL._serialized_start = 3733 -_SQL._serialized_end = 4211 -_SQL_ARGSENTRY._serialized_start = 40
(spark) branch master updated: [SPARK-47893][BUILD] Upgrade ASM to 9.7
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 51ca47da6c1a [SPARK-47893][BUILD] Upgrade ASM to 9.7 51ca47da6c1a is described below commit 51ca47da6c1ab9da8e68de0a0418a6a59457f7f8 Author: panbingkun AuthorDate: Thu Apr 18 07:48:12 2024 -0700 [SPARK-47893][BUILD] Upgrade ASM to 9.7 ### What changes were proposed in this pull request? This PR aims to upgrade ASM to 9.7. ### Why are the changes needed? xbean-asm9-shaded 4.25 upgrade to use `ASM 9.7` and `ASM 9.7` is for `Java 23`. https://asm.ow2.io/versions.html https://github.com/apache/geronimo-xbean/pull/40 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46110 from panbingkun/SPARK-47893. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 4 ++-- project/plugins.sbt | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 71a87a9f519d..45a4d499e513 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -272,7 +272,7 @@ transaction-api/1.1//transaction-api-1.1.jar txw2/3.0.2//txw2-3.0.2.jar univocity-parsers/2.9.1//univocity-parsers-2.9.1.jar wildfly-openssl/1.1.3.Final//wildfly-openssl-1.1.3.Final.jar -xbean-asm9-shaded/4.24//xbean-asm9-shaded-4.24.jar +xbean-asm9-shaded/4.25//xbean-asm9-shaded-4.25.jar xmlschema-core/2.3.1//xmlschema-core-2.3.1.jar xz/1.9//xz-1.9.jar zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar diff --git a/pom.xml b/pom.xml index e6b37610adb2..682365d9704a 100644 --- a/pom.xml +++ b/pom.xml @@ -118,7 +118,7 @@ 3.9.6 3.2.0 spark -9.6 +9.7 2.0.13 2.22.1 @@ -481,7 +481,7 @@ org.apache.xbean xbean-asm9-shaded -4.24 +4.25
(spark) tag v3.4.3 created (now 1eb558c3a6fb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to tag v3.4.3 in repository https://gitbox.apache.org/repos/asf/spark.git at 1eb558c3a6fb (commit) No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r68618 - /dev/spark/v3.4.3-rc2-bin/ /release/spark/spark-3.4.3/
Author: dongjoon Date: Thu Apr 18 08:09:41 2024 New Revision: 68618 Log: Release Apache Spark 3.4.3 Added: release/spark/spark-3.4.3/ - copied from r68617, dev/spark/v3.4.3-rc2-bin/ Removed: dev/spark/v3.4.3-rc2-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47896][BUILD] Upgrade netty to `4.1.109.Final`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new eb8688c2b6ce [SPARK-47896][BUILD] Upgrade netty to `4.1.109.Final` eb8688c2b6ce is described below commit eb8688c2b6cebb319511ca3102fc0f933adbafa2 Author: panbingkun AuthorDate: Wed Apr 17 23:09:06 2024 -0700 [SPARK-47896][BUILD] Upgrade netty to `4.1.109.Final` ### What changes were proposed in this pull request? The pr aims to upgrade `netty` from `4.1.108.Final` to `4.1.109.Final`. ### Why are the changes needed? https://netty.io/news/2024/04/15/4-1-109-Final.html This version has brought some bug fixes and improvements, such as: - Fix DefaultChannelId#asLongText NPE ([#13971](https://github.com/netty/netty/pull/13971)) - Rewrite ZstdDecoder to remove the need of allocate a huge byte[] internally ([#13928](https://github.com/netty/netty/pull/13928)) - Don't send a RST frame when closing the stream in a write future while processing inbound frames ([#13973](https://github.com/netty/netty/pull/13973)) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46112 from panbingkun/netty_for_spark4. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 38 +-- pom.xml | 2 +- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 54e54a108904..71a87a9f519d 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -198,16 +198,16 @@ metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar metrics-json/4.2.25//metrics-json-4.2.25.jar metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.108.Final//netty-all-4.1.108.Final.jar -netty-buffer/4.1.108.Final//netty-buffer-4.1.108.Final.jar -netty-codec-http/4.1.108.Final//netty-codec-http-4.1.108.Final.jar -netty-codec-http2/4.1.108.Final//netty-codec-http2-4.1.108.Final.jar -netty-codec-socks/4.1.108.Final//netty-codec-socks-4.1.108.Final.jar -netty-codec/4.1.108.Final//netty-codec-4.1.108.Final.jar -netty-common/4.1.108.Final//netty-common-4.1.108.Final.jar -netty-handler-proxy/4.1.108.Final//netty-handler-proxy-4.1.108.Final.jar -netty-handler/4.1.108.Final//netty-handler-4.1.108.Final.jar -netty-resolver/4.1.108.Final//netty-resolver-4.1.108.Final.jar +netty-all/4.1.109.Final//netty-all-4.1.109.Final.jar +netty-buffer/4.1.109.Final//netty-buffer-4.1.109.Final.jar +netty-codec-http/4.1.109.Final//netty-codec-http-4.1.109.Final.jar +netty-codec-http2/4.1.109.Final//netty-codec-http2-4.1.109.Final.jar +netty-codec-socks/4.1.109.Final//netty-codec-socks-4.1.109.Final.jar +netty-codec/4.1.109.Final//netty-codec-4.1.109.Final.jar +netty-common/4.1.109.Final//netty-common-4.1.109.Final.jar +netty-handler-proxy/4.1.109.Final//netty-handler-proxy-4.1.109.Final.jar +netty-handler/4.1.109.Final//netty-handler-4.1.109.Final.jar +netty-resolver/4.1.109.Final//netty-resolver-4.1.109.Final.jar netty-tcnative-boringssl-static/2.0.61.Final//netty-tcnative-boringssl-static-2.0.61.Final.jar netty-tcnative-boringssl-static/2.0.65.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.65.Final-linux-aarch_64.jar netty-tcnative-boringssl-static/2.0.65.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-linux-x86_64.jar @@ -215,15 +215,15 @@ netty-tcnative-boringssl-static/2.0.65.Final/osx-aarch_64/netty-tcnative-borings netty-tcnative-boringssl-static/2.0.65.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-osx-x86_64.jar netty-tcnative-boringssl-static/2.0.65.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-windows-x86_64.jar netty-tcnative-classes/2.0.65.Final//netty-tcnative-classes-2.0.65.Final.jar -netty-transport-classes-epoll/4.1.108.Final//netty-transport-classes-epoll-4.1.108.Final.jar -netty-transport-classes-kqueue/4.1.108.Final//netty-transport-classes-kqueue-4.1.108.Final.jar -netty-transport-native-epoll/4.1.108.Final/linux-aarch_64/netty-transport-native-epoll-4.1.108.Final-linux-aarch_64.jar -netty-transport-native-epoll/4.1.108.Final/linux-riscv64/netty-transport-native-epoll-4.1.108.Final-linux-riscv64.jar -netty-transport-native-epoll/4.1.108.Final/linux-x86_64/netty-transport-native-epoll-4.1.108.Final-linux-x86_64.jar -netty-transport-native-kqueue/4.1.108.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.108.Final-osx-aarch_64.jar -netty-transport-native-kqueue/4.1.108.Final/osx-x86_64/netty-transport-native-kqueue-4.1.108.Final-osx-x86_64.jar -netty-transport
(spark) branch master updated: [SPARK-47882][SQL] createTableColumnTypes need to be mapped to database types instead of using directly
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 47d783bc6489 [SPARK-47882][SQL] createTableColumnTypes need to be mapped to database types instead of using directly 47d783bc6489 is described below commit 47d783bc64897c85294a32d5ea2ca0ec8a655ea7 Author: Kent Yao AuthorDate: Wed Apr 17 20:34:16 2024 -0700 [SPARK-47882][SQL] createTableColumnTypes need to be mapped to database types instead of using directly ### What changes were proposed in this pull request? createTableColumnTypes contains Spark SQL data type definitions. The underlying database might not recognize them, boolean for Oracle(v < 23c). ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #46093 from yaooqinn/SPARK-47882. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala | 14 -- .../test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 12 ++-- .../scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala | 8 +--- 3 files changed, 23 insertions(+), 11 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index fd7be9d0ea41..c541ec16fc82 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -878,16 +878,15 @@ object JdbcUtils extends Logging with SQLConfHelper { * Compute the schema string for this RDD. */ def schemaString( + dialect: JdbcDialect, schema: StructType, caseSensitive: Boolean, - url: String, createTableColumnTypes: Option[String] = None): String = { val sb = new StringBuilder() -val dialect = JdbcDialects.get(url) val userSpecifiedColTypesMap = createTableColumnTypes - .map(parseUserSpecifiedCreateTableColumnTypes(schema, caseSensitive, _)) + .map(parseUserSpecifiedCreateTableColumnTypes(dialect, schema, caseSensitive, _)) .getOrElse(Map.empty[String, String]) -schema.fields.foreach { field => +schema.foreach { field => val name = dialect.quoteIdentifier(field.name) val typ = userSpecifiedColTypesMap .getOrElse(field.name, getJdbcType(field.dataType, dialect).databaseTypeDefinition) @@ -903,6 +902,7 @@ object JdbcUtils extends Logging with SQLConfHelper { * use in-place of the default data type. */ private def parseUserSpecifiedCreateTableColumnTypes( + dialect: JdbcDialect, schema: StructType, caseSensitive: Boolean, createTableColumnTypes: String): Map[String, String] = { @@ -919,7 +919,9 @@ object JdbcUtils extends Logging with SQLConfHelper { } } -val userSchemaMap = userSchema.fields.map(f => f.name -> f.dataType.catalogString).toMap +val userSchemaMap = userSchema + .map(f => f.name -> getJdbcType(f.dataType, dialect).databaseTypeDefinition) + .toMap if (caseSensitive) userSchemaMap else CaseInsensitiveMap(userSchemaMap) } @@ -988,7 +990,7 @@ object JdbcUtils extends Logging with SQLConfHelper { val statement = conn.createStatement val dialect = JdbcDialects.get(options.url) val strSchema = schemaString( - schema, caseSensitive, options.url, options.createTableColumnTypes) + dialect, schema, caseSensitive, options.createTableColumnTypes) try { statement.setQueryTimeout(options.queryTimeout) dialect.createTable(statement, tableName, strSchema, options) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala index 5915a44b7954..34c554f7d37e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala @@ -1372,9 +1372,9 @@ class JDBCSuite extends QueryTest with SharedSparkSession { test("SPARK-16387: Reserved SQL words are not escaped by JDBC writer") { val df = spark.createDataset(Seq("a", "b", "c")).toDF("order") val schema = JdbcUtils.schemaString( + JdbcDialects.get("jdbc:mysql://localhost:3306/temp"), df.schema, - df.sparkSession.sessionState.conf.caseSensitiveAnalysis, - "jdbc:mysql://localhost:3306/temp") + df.sparkSes
(spark) branch master updated: [SPARK-47894][CORE][WEBUI] Add `Environment` page to Master UI
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 24b4581fa818 [SPARK-47894][CORE][WEBUI] Add `Environment` page to Master UI 24b4581fa818 is described below commit 24b4581fa818da89a5aff57437addcece707e678 Author: Dongjoon Hyun AuthorDate: Wed Apr 17 20:29:54 2024 -0700 [SPARK-47894][CORE][WEBUI] Add `Environment` page to Master UI ### What changes were proposed in this pull request? This PR aims to add `Environment` page to `Spark Master UI`. ### Why are the changes needed? To improve `Spark Standalone` cluster UX by providing `Spark Master` JVM's information - `Runtime Information` - `Spark Properties` - `Hadoop Properties` - `System Properties` - `Metrics Properties` - `Classpath Entries` https://github.com/apache/spark/assets/9700541/2b02abbd-e08f-4b0f-834a-160ea6fd00c7;> https://github.com/apache/spark/assets/9700541/664d113a-b677-41a7-9e8c-841e087aae1d;> ### Does this PR introduce _any_ user-facing change? Yes, but this is a new UI. ### How was this patch tested? Pass the CIs with the newly added test case. Or manual check the UI after running `Master`. ``` $ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true -Dspark.deploy.maxDrivers=2" sbin/start-master.sh ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46111 from dongjoon-hyun/SPARK-47894. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../spark/deploy/master/ui/EnvironmentPage.scala | 141 + .../apache/spark/deploy/master/ui/MasterPage.scala | 5 +- .../spark/deploy/master/ui/MasterWebUI.scala | 5 + .../master/ui/ReadOnlyMasterWebUISuite.scala | 14 +- 4 files changed, 162 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala b/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala new file mode 100644 index ..190e821524ba --- /dev/null +++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.master.ui + +import scala.xml.Node + +import jakarta.servlet.http.HttpServletRequest + +import org.apache.spark.{SparkConf, SparkEnv} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.ui._ +import org.apache.spark.util.Utils + +private[ui] class EnvironmentPage( +parent: MasterWebUI, +conf: SparkConf) extends WebUIPage("Environment") { + + def render(request: HttpServletRequest): Seq[Node] = { +val details = SparkEnv.environmentDetails(conf, SparkHadoopUtil.get.newConfiguration(conf), + "", Seq.empty, Seq.empty, Seq.empty, Map.empty) +val jvmInformation = details("JVM Information").sorted +val sparkProperties = Utils.redact(conf, details("Spark Properties")).sorted +val hadoopProperties = Utils.redact(conf, details("Hadoop Properties")).sorted +val systemProperties = Utils.redact(conf, details("System Properties")).sorted +val metricsProperties = Utils.redact(conf, details("Metrics Properties")).sorted +val classpathEntries = details("Classpath Entries").sorted + +val runtimeInformationTable = UIUtils.listingTable(propertyHeader, propertyRow, + jvmInformation, fixedWidth = true, headerClasses = headerClasses) +val sparkPropertiesTable = UIUtils.listingTable(propertyHeader, propertyRow, + sparkProperties, fixedWidth = true, headerClasses = headerClasses) +val hadoopPropertiesTable = UIUtils.listingTable(propertyHeader, propertyRow, + hadoopProperties, fixedWidth = true, headerClasses = headerClasses) +val systemPropertiesTable = UIUtils.listingTable(propertyHeader, pro
(spark) branch master updated: [SPARK-47726][DOC] Document push-based shuffle metrics
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new be6eef97a81c [SPARK-47726][DOC] Document push-based shuffle metrics be6eef97a81c is described below commit be6eef97a81c147272d5bee09afc5d423586762f Author: Luca Canali AuthorDate: Wed Apr 17 09:35:05 2024 -0700 [SPARK-47726][DOC] Document push-based shuffle metrics ### What changes were proposed in this pull request? This adds documentation for the push-based shuffle metrics ### Why are the changes needed? The push-based shuffle metrics are currently not documented ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #45872 from LucaCanali/documentPushBasedShuffle. Authored-by: Luca Canali Signed-off-by: Dongjoon Hyun --- docs/monitoring.md | 11 +++ 1 file changed, 11 insertions(+) diff --git a/docs/monitoring.md b/docs/monitoring.md index 5e11d5aef81e..a008b71c3fe9 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -1301,6 +1301,17 @@ These metrics are exposed by Spark executors. - shuffleRemoteBytesReadToDisk.count - shuffleTotalBytesRead.count - shuffleWriteTime.count + - Metrics related to push-based shuffle: +- shuffleCorruptMergedBlockChunks +- shuffleMergedFetchFallbackCount +- shuffleMergedRemoteBlocksFetched +- shuffleMergedLocalBlocksFetched +- shuffleMergedRemoteChunksFetched +- shuffleMergedLocalChunksFetched +- shuffleMergedRemoteBytesRead +- shuffleMergedLocalBytesRead +- shuffleRemoteReqsDuration +- shuffleMergedRemoteReqsDuration - succeededTasks.count - threadpool.activeTasks - threadpool.completeTasks - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and doc for Postgres special numeric values
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 19833d92f325 [SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and doc for Postgres special numeric values 19833d92f325 is described below commit 19833d92f3258ea2b4dcf803217e7a7334ecd927 Author: Kent Yao AuthorDate: Wed Apr 17 07:50:35 2024 -0700 [SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and doc for Postgres special numeric values ### What changes were proposed in this pull request? This PR added tests and doc for Postgres special numeric values. Postgres supports special numeric values "NaN", "infinity", "-infinity" for both exact and inexact numbers, while we only support these for inexact ones. ### Why are the changes needed? test coverage and doc improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test and doc build ![image](https://github.com/apache/spark/assets/8326978/4e46be31-981d-4625-91f2-f81c4d40abed) ### Was this patch authored or co-authored using generative AI tooling? no Closes #46102 from yaooqinn/SPARK-47886. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 30 -- docs/sql-data-sources-jdbc.md | 2 +- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala index 1cd8a77e8442..8c0a7c0a809f 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala @@ -18,12 +18,13 @@ package org.apache.spark.sql.jdbc import java.math.{BigDecimal => JBigDecimal} -import java.sql.{Connection, Date, Timestamp} +import java.sql.{Connection, Date, SQLException, Timestamp} import java.text.SimpleDateFormat import java.time.LocalDateTime import java.util.Properties -import org.apache.spark.sql.{Column, Row} +import org.apache.spark.SparkException +import org.apache.spark.sql.{Column, DataFrame, Row} import org.apache.spark.sql.catalyst.expressions.Literal import org.apache.spark.sql.types._ import org.apache.spark.tags.DockerTest @@ -554,4 +555,29 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite { .option("query", "SELECT 1::oid, 'bar'::regclass, 'integer'::regtype").load() checkAnswer(df, Row(1, "bar", "integer")) } + + test("SPARK-47886: special number values") { +def toDF(qry: String): DataFrame = { + spark.read.format("jdbc") +.option("url", jdbcUrl) +.option("query", qry) +.load() +} +checkAnswer( + toDF("SELECT 'NaN'::float8 c1, 'infinity'::float8 c2, '-infinity'::float8 c3"), + Row(Double.NaN, Double.PositiveInfinity, Double.NegativeInfinity)) +checkAnswer( + toDF("SELECT 'NaN'::float4 c1, 'infinity'::float4 c2, '-infinity'::float4 c3"), + Row(Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity) +) + +Seq("NaN", "infinity", "-infinity").foreach { v => + val df = toDF(s"SELECT '$v'::numeric c1") + val e = intercept[SparkException](df.collect()) + checkError(e, null) + val cause = e.getCause.asInstanceOf[SQLException] + assert(cause.getMessage.contains("Bad value for type BigDecimal")) + assert(cause.getSQLState === "22003") +} + } } diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md index ef7a07a82c5f..637efc24113e 100644 --- a/docs/sql-data-sources-jdbc.md +++ b/docs/sql-data-sources-jdbc.md @@ -845,7 +845,7 @@ as the activated JDBC Driver. Note that, different JDBC drivers, or different ve numeric, decimal DecimalType - Since PostgreSQL 15, 's' can be negative. If 's<0' it'll be adjusted to DecimalType(min(p-s, 38), 0); Otherwise, DecimalType(p, s), and if 'p>38', the fraction part will be truncated if exceeded. And if any value of this column have an actual precision greater 38 will fail with NUMERIC_VALUE_OUT_OF_RANGE.WITHOUT_SUGGESTION error + Since PostgreSQL 15, 's' can be negative. If 's<0' it'll be adjusted to DecimalType(min(p-s, 38), 0); Otherwise, DecimalType(p, s)If 'p>38', the fraction part will be truncated if exceeded. And
(spark) branch master updated (4e754f778fdc -> 6fb2f7c3772a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4e754f778fdc [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type add 6fb2f7c3772a [SPARK-4][SQL] Use ANSI SQL mode by default No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 1 + sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ab6338e09aa0 [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 ab6338e09aa0 is described below commit ab6338e09aa0fe06aef1c753eaaf677f766e9490 Author: Neil Ramaswamy AuthorDate: Tue Apr 16 20:11:16 2024 -0700 [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 ### What changes were proposed in this pull request? Upgrades `rocksdbjni` dependency to 8.11.4. ### Why are the changes needed? 8.11.4 has Java-related RocksDB fixes: https://github.com/facebook/rocksdb/releases/tag/v8.11.4 - Fixed CMake Javadoc build - Fixed Java SstFileMetaData to prevent throwing java.lang.NoSuchMethodError ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - All existing UTs should pass - [In progress] Performance benchmarks ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46065 from neilramaswamy/spark-47838. Authored-by: Neil Ramaswamy Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml| 2 +- ...StoreBasicOperationsBenchmark-jdk21-results.txt | 122 +++-- .../StateStoreBasicOperationsBenchmark-results.txt | 122 +++-- 4 files changed, 126 insertions(+), 122 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 466e8d09d89e..54e54a108904 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -247,7 +247,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar pickle/1.3//pickle-1.3.jar py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar -rocksdbjni/8.11.3//rocksdbjni-8.11.3.jar +rocksdbjni/8.11.4//rocksdbjni-8.11.4.jar scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar scala-compiler/2.13.13//scala-compiler-2.13.13.jar scala-library/2.13.13//scala-library-2.13.13.jar diff --git a/pom.xml b/pom.xml index bf8d4f1b417d..7ded74b9f9df 100644 --- a/pom.xml +++ b/pom.xml @@ -687,7 +687,7 @@ org.rocksdb rocksdbjni -8.11.3 +8.11.4 ${leveldbjni.group} diff --git a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt index 0317e6116375..953031fc1daf 100644 --- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt +++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt @@ -2,141 +2,143 @@ put rows -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure AMD EPYC 7763 64-Core Processor putting 1 rows (1 rows to overwrite - rate 100): Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative --- -In-memory9 10 1 1.1 936.2 1.0X -RocksDB (trackTotalNumberOfRows: true) 41 42 1 0.24068.9 0.2X -RocksDB (trackTotalNumberOfRows: false) 15 16 1 0.71500.4 0.6X +In-memory9 10 1 1.1 938.9 1.0X +RocksDB (trackTotalNumberOfRows: true) 42 44 2 0.24215.2 0.2X +RocksDB (trackTotalNumberOfRows: false) 15 16 1 0.71535.3 0.6X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure AMD EPYC 7763 64-Core Processor putting 1 rows (5000 rows to overwrite - rate 50): Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -In-memory 9 11 1 1.1 929.8 1.0X -RocksDB (trackTotalNumberOfRows: true)40
(spark) branch master updated: [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5321353b24db [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` 5321353b24db is described below commit 5321353b24db247087890c44de06b9ad4e136473 Author: Dongjoon Hyun AuthorDate: Tue Apr 16 16:47:23 2024 -0700 [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` ### What changes were proposed in this pull request? This is a logical revert of SPARK-46205 - #44113 - #44118 ### Why are the changes needed? The initial implementation didn't handle the class initialization logic properly. Until we have a fix, I'd like to revert this from `master` branch. ### Does this PR introduce _any_ user-facing change? No, this is not released yet. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46087 from dongjoon-hyun/SPARK-47875. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../PersistenceEngineBenchmark-jdk21-results.txt | 7 -- .../PersistenceEngineBenchmark-results.txt | 7 -- .../org/apache/spark/deploy/master/Master.scala| 7 ++ .../org/apache/spark/internal/config/Deploy.scala | 14 .../deploy/master/PersistenceEngineBenchmark.scala | 4 ++-- .../deploy/master/PersistenceEngineSuite.scala | 14 +--- .../apache/spark/deploy/master/RecoverySuite.scala | 25 ++ docs/spark-standalone.md | 12 ++- 8 files changed, 9 insertions(+), 81 deletions(-) diff --git a/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt b/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt index 2a6bd778fc8a..ae4e0071adb0 100644 --- a/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt +++ b/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt @@ -7,19 +7,12 @@ AMD EPYC 7763 64-Core Processor 1000 Workers: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative ZooKeeperPersistenceEngine with JavaSerializer 5036 5232 229 0.0 5035730.1 1.0X -ZooKeeperPersistenceEngine with KryoSerializer 4038 4053 16 0.0 4038447.8 1.2X FileSystemPersistenceEngine with JavaSerializer2902 2906 5 0.0 2902453.3 1.7X FileSystemPersistenceEngine with JavaSerializer (lz4) 816 829 19 0.0 816173.1 6.2X FileSystemPersistenceEngine with JavaSerializer (lzf) 755 780 33 0.0 755209.0 6.7X FileSystemPersistenceEngine with JavaSerializer (snappy)814 832 16 0.0 813672.5 6.2X FileSystemPersistenceEngine with JavaSerializer (zstd) 987 1014 45 0.0 986834.7 5.1X -FileSystemPersistenceEngine with KryoSerializer 687 698 14 0.0 687313.5 7.3X -FileSystemPersistenceEngine with KryoSerializer (lz4) 590 599 15 0.0 589867.9 8.5X -FileSystemPersistenceEngine with KryoSerializer (lzf) 915 922 9 0.0 915432.2 5.5X -FileSystemPersistenceEngine with KryoSerializer (snappy)768 795 37 0.0 768494.4 6.6X -FileSystemPersistenceEngine with KryoSerializer (zstd) 898 950 45 0.0 898118.6 5.6X RocksDBPersistenceEngine with JavaSerializer299 299 0 0.0 298800.0 16.9X -RocksDBPersistenceEngine with KryoSerializer112 113 1 0.0 111779.6 45.1X BlackHolePersistenceEngine0 0 0 5.5 180.3 27924.2X diff --git a/core/benchmarks/PersistenceEngineBenchmark-results.txt b/core/benchmarks/PersistenceEngineBenchmark-results.txt index da1838608de1..ec9a6fc1c8cf 100644 --- a/core/benchmarks/PersistenceEngineBenchmark-results.txt +++ b/core/benchmarks/PersistenceEngineBenchmark-results.txt @@ -7,19 +7,12 @@ AMD EPYC 7763 64-Core Processor 1000 Workers: Best
(spark) branch master updated: [SPARK-47871][SQL] Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9a1fc112677f [SPARK-47871][SQL] Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE 9a1fc112677f is described below commit 9a1fc112677f98089d946b3bf4f52b33ab0a5c23 Author: Kent Yao AuthorDate: Tue Apr 16 08:35:51 2024 -0700 [SPARK-47871][SQL] Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE ### What changes were proposed in this pull request? This PR map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE ### Why are the changes needed? We currently map both TimestampType and TimestampNTZType to Oracle's TIMESTAMP which represents a timestamp without time zone. This is ambiguous ### Does this PR introduce _any_ user-facing change? It does not affect spark users to play a TimestampType read-write-read roundtrip, but might affect other systems' reading ### How was this patch tested? existing test with new configuration ```java SPARK-42627: Support ORACLE TIMESTAMP WITH LOCAL TIME ZONE (9 seconds, 536 milliseconds) ``` ### Was this patch authored or co-authored using generative AI tooling? no Closes #46080 from yaooqinn/SPARK-47871. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/OracleIntegrationSuite.scala| 39 -- docs/sql-migration-guide.md| 1 + .../org/apache/spark/sql/internal/SQLConf.scala| 12 +++ .../org/apache/spark/sql/jdbc/OracleDialect.scala | 5 ++- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 5 ++- 5 files changed, 43 insertions(+), 19 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala index 418b86fb6b23..496498e5455b 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala @@ -547,23 +547,28 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSpark } test("SPARK-42627: Support ORACLE TIMESTAMP WITH LOCAL TIME ZONE") { -val reader = spark.read.format("jdbc") - .option("url", jdbcUrl) - .option("dbtable", "test_ltz") -val df = reader.load() -val row1 = df.collect().head.getTimestamp(0) -assert(df.count() === 1) -assert(row1 === Timestamp.valueOf("2018-11-17 13:33:33")) - -df.write.format("jdbc") - .option("url", jdbcUrl) - .option("dbtable", "test_ltz") - .mode("append") - .save() - -val df2 = reader.load() -assert(df.count() === 2) -assert(df2.collect().forall(_.getTimestamp(0) === row1)) +Seq("true", "false").foreach { flag => + withSQLConf((SQLConf.LEGACY_ORACLE_TIMESTAMP_MAPPING_ENABLED.key, flag)) { +val df = spark.read.format("jdbc") + .option("url", jdbcUrl) + .option("dbtable", "test_ltz") + .load() +val row1 = df.collect().head.getTimestamp(0) +assert(df.count() === 1) +assert(row1 === Timestamp.valueOf("2018-11-17 13:33:33")) + +df.write.format("jdbc") + .option("url", jdbcUrl) + .option("dbtable", "test_ltz" + flag) + .save() + +val df2 = spark.read.format("jdbc") + .option("url", jdbcUrl) + .option("dbtable", "test_ltz" + flag) + .load() +checkAnswer(df2, Row(row1)) + } +} } test("SPARK-47761: Reading ANSI INTERVAL Types") { diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index c7bd0b55840c..3004008b8ec7 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -45,6 +45,7 @@ license: | - Since Spark 4.0, MySQL JDBC datasource will read FLOAT as FloatType, while in Spark 3.5 and previous, it was read as DoubleType. To restore the previous behavior, you can cast the column to the old type. - Since Spark 4.0, MySQL JDBC datasource will read BIT(n > 1) as BinaryType, while in Spark 3.5 and previous, read as LongType. To restore the previous behavior, set `spark.sql.legacy.mysql.bitArrayMapping.enabled` to `true`. - Since Spark 4.0, MySQL JDBC datasource will write ShortType as SMALLINT, while in Spark 3.5 and previous, write as INTEGER. To restore the pre
(spark-kubernetes-operator) branch main updated: [SPARK-47745] Add License to Spark Operator repository
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 7a3a7e8 [SPARK-47745] Add License to Spark Operator repository 7a3a7e8 is described below commit 7a3a7e882af2c8e8d463ebed71329212133d229c Author: zhou-jiang AuthorDate: Tue Apr 16 08:08:26 2024 -0700 [SPARK-47745] Add License to Spark Operator repository ### What changes were proposed in this pull request? This PR aims to add ASF license file. ### Why are the changes needed? To receive a code contribution. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #3 from jiangzho/license. Authored-by: zhou-jiang Signed-off-by: Dongjoon Hyun --- LICENSE | 201 1 file changed, 201 insertions(+) diff --git a/LICENSE b/LICENSE new file mode 100644 index 000..261eeb9 --- /dev/null +++ b/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 +http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated with
(spark-kubernetes-operator) branch main updated: Update GITHUB_API_BASE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new a8eb690 Update GITHUB_API_BASE a8eb690 is described below commit a8eb690a7a85fd2b580e3756fad8d2bcf306e12c Author: Dongjoon Hyun AuthorDate: Tue Apr 16 08:06:10 2024 -0700 Update GITHUB_API_BASE --- dev/merge_spark_pr.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py index 4647383..24e956d 100755 --- a/dev/merge_spark_pr.py +++ b/dev/merge_spark_pr.py @@ -65,7 +65,7 @@ GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY") GITHUB_BASE = "https://github.com/apache/spark-kubernetes-operator/pull; -GITHUB_API_BASE = "https://api.github.com/repos/spark-kubernetes-operator; +GITHUB_API_BASE = "https://api.github.com/repos/apache/spark-kubernetes-operator; JIRA_BASE = "https://issues.apache.org/jira/browse; JIRA_API_BASE = "https://issues.apache.org/jira; # Prefix added to temporary branches - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47739][SQL] Register logical avro type
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fa2e9c7275aa [SPARK-47739][SQL] Register logical avro type fa2e9c7275aa is described below commit fa2e9c7275aa1c09652d0df0992565c32974b2b9 Author: milastdbx AuthorDate: Tue Apr 16 03:38:19 2024 -0700 [SPARK-47739][SQL] Register logical avro type ### What changes were proposed in this pull request? In this pull request I propose that we register logical avro types when we initialize `AvroUtils` and `AvroFileFormat`, otherwise for first schema discovery we might get wrong result on very first execution after spark starts. https://github.com/apache/spark/assets/150366084/3eaba6e3-34ec-4ca9-ae89-d0259ce942ba;> example ```scala val new_schema = """ | { | "type": "record", | "name": "Entry", | "fields": [ | { | "name": "rate", | "type": [ | "null", | { | "type": "long", | "logicalType": "custom-decimal", | "precision": 38, | "scale": 9 | } | ], | "default": null | } | ] | }""".stripMargin spark.read.format("avro").option("avroSchema", new_schema).load().printSchema // maps to long - WRONG spark.read.format("avro").option("avroSchema", new_schema).load().printSchema // maps to Decimal - CORRECT ``` ### Why are the changes needed? To fix issue with resolving avro schema upon spark startup. ### Does this PR introduce _any_ user-facing change? No, its a bugfix ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #45895 from milastdbx/dev/milast/fixAvroLogicalTypeRegistration. Lead-authored-by: milastdbx Co-authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/avro/AvroFileFormat.scala | 21 -- .../spark/sql/avro/AvroLogicalTypeInitSuite.scala | 76 ++ 2 files changed, 91 insertions(+), 6 deletions(-) diff --git a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala index 2792edaea284..372f24b54f5c 100755 --- a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala +++ b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala @@ -43,6 +43,8 @@ import org.apache.spark.util.SerializableConfiguration private[sql] class AvroFileFormat extends FileFormat with DataSourceRegister with Logging with Serializable { + AvroFileFormat.registerCustomAvroTypes() + override def equals(other: Any): Boolean = other match { case _: AvroFileFormat => true case _ => false @@ -173,10 +175,17 @@ private[sql] class AvroFileFormat extends FileFormat private[avro] object AvroFileFormat { val IgnoreFilesWithoutExtensionProperty = "avro.mapred.ignore.inputs.without.extension" - // Register the customized decimal type backed by long. - LogicalTypes.register(CustomDecimal.TYPE_NAME, new LogicalTypes.LogicalTypeFactory { -override def fromSchema(schema: Schema): LogicalType = { - new CustomDecimal(schema) -} - }) + /** + * Register Spark defined custom Avro types. + */ + def registerCustomAvroTypes(): Unit = { +// Register the customized decimal type backed by long. +LogicalTypes.register(CustomDecimal.TYPE_NAME, new LogicalTypes.LogicalTypeFactory { + override def fromSchema(schema: Schema): LogicalType = { +new CustomDecimal(schema) + } +}) + } + + registerCustomAvroTypes() } diff --git a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeInitSuite.scala b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeInitSuite.scala new file mode 100644 index ..126440ed69b8 --- /dev/null +++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeInitSuite.scala @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file exc
(spark) branch master updated: [SPARK-46574][BUILD] Upgrade maven plugin to latest version
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b7a729bfd19c [SPARK-46574][BUILD] Upgrade maven plugin to latest version b7a729bfd19c is described below commit b7a729bfd19cfa7a06d208f3899d329e414d5598 Author: panbingkun AuthorDate: Tue Apr 16 03:30:12 2024 -0700 [SPARK-46574][BUILD] Upgrade maven plugin to latest version ### What changes were proposed in this pull request? ### Why are the changes needed? - `exec-maven-plugin` from `3.1.0` to `3.2.0` https://github.com/mojohaus/exec-maven-plugin/releases/tag/3.2.0 https://github.com/mojohaus/exec-maven-plugin/releases/tag/3.1.1 Bug Fixes: 1.Fix https://github.com/mojohaus/exec-maven-plugin/issues/158 - Fix non ascii character handling (https://github.com/mojohaus/exec-maven-plugin/pull/372) 2.[https://github.com/mojohaus/exec-maven-plugin/issues/323] exec arguments missing (https://github.com/mojohaus/exec-maven-plugin/pull/324) - `build-helper-maven-plugin` from `3.4.0` to `3.5.0` https://github.com/mojohaus/build-helper-maven-plugin/releases/tag/3.5.0 - `maven-compiler-plugin` from `3.12.1` to `3.13.0` https://github.com/apache/maven-compiler-plugin/releases/tag/maven-compiler-plugin-3.13.0 - `maven-jar-plugin` from `3.3.0` to `3.4.0` https://github.com/apache/maven-jar-plugin/releases/tag/maven-jar-plugin-3.4.0 [[MJAR-62]](https://issues.apache.org/jira/browse/MJAR-62) - Set Build-Jdk according to used toolchain (https://github.com/apache/maven-jar-plugin/pull/73) - `maven-source-plugin` from `3.3.0` to `3.3.1` https://github.com/apache/maven-source-plugin/releases/tag/maven-source-plugin-3.3.1 - `maven-assembly-plugin` from `3.6.0` to `3.7.1` https://github.com/apache/maven-assembly-plugin/releases/tag/maven-assembly-plugin-3.7.1 https://github.com/apache/maven-assembly-plugin/releases/tag/maven-assembly-plugin-3.7.0 Bug Fixes: 1.[[MASSEMBLY-967](https://issues.apache.org/jira/browse/MASSEMBLY-967)] - maven-assembly-plugin doesn't add target/class artifacts in generated jarfat but META-INF/MANIFEST.MF seems to be correct 2.[[MASSEMBLY-994](https://issues.apache.org/jira/browse/MASSEMBLY-994)] - Items from unpacked dependency are not refreshed 3.[[MASSEMBLY-998](https://issues.apache.org/jira/browse/MASSEMBLY-998)] - Transitive dependencies are not properly excluded as of 3.1.1 4.[[MASSEMBLY-1008](https://issues.apache.org/jira/browse/MASSEMBLY-1008)] - Assembly plugin handles scopes wrongly 5.[[MASSEMBLY-1020](https://issues.apache.org/jira/browse/MASSEMBLY-1020)] - Cannot invoke "java.io.File.isFile()" because "this.inputFile" is null 6.[[MASSEMBLY-1021](https://issues.apache.org/jira/browse/MASSEMBLY-1021)] - Nullpointer in assembly:single when upgrading to 3.7.0 7.[[MASSEMBLY-1022](https://issues.apache.org/jira/browse/MASSEMBLY-1022)] - Unresolved artifacts should be not processed - `cyclonedx-maven-plugin` from `2.7.9` to `2.8.0` https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.8.0 https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.11 https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.10 Bug Fixes: 1.check if configured schemaVersion is supported (https://github.com/CycloneDX/cyclonedx-maven-plugin/pull/479) 2.ignore bomGenerator.generate() call (https://github.com/CycloneDX/cyclonedx-maven-plugin/pull/376) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46043 from panbingkun/update_maven_plugins. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- pom.xml | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/pom.xml b/pom.xml index 99b238aac1dc..bf8d4f1b417d 100644 --- a/pom.xml +++ b/pom.xml @@ -116,7 +116,7 @@ 17 ${java.version} 3.9.6 -3.1.0 +3.2.0 spark 9.6 2.0.13 @@ -2994,7 +2994,7 @@ org.codehaus.mojo build-helper-maven-plugin - 3.4.0 + 3.5.0 module-timestamp-property @@ -3108,7 +3108,7 @@ org.apache.maven.plugins maven-compiler-plugin - 3.12.1 + 3.13.0 ${java.version} true @@ -3234,7 +3234,7 @@ org.apache.maven.plugins maven-jar-plugin - 3.3.0 + 3.4.0 org.apache.mave
(spark) branch branch-3.5 updated: [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new d54f24cf3c3d [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 d54f24cf3c3d is described below commit d54f24cf3c3dc8107fc143d47f7c61edb3ebdc32 Author: Dongjoon Hyun AuthorDate: Mon Apr 15 20:39:32 2024 -0700 [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 ### What changes were proposed in this pull request? This PR aims to upgrade `Apache Maven` to 3.9.6 for Apache Spark 3.5.2+ This is a backport of the following PR. `Apache Maven 3.9.6` has been used over 4 months in `master` branch. - #44267 ### Why are the changes needed? To bring the latest bug fixes, - https://maven.apache.org/docs/3.9.0/release-notes.html - https://maven.apache.org/docs/3.9.1/release-notes.html - https://maven.apache.org/docs/3.9.2/release-notes.html - https://maven.apache.org/docs/3.9.3/release-notes.html - https://maven.apache.org/docs/3.9.5/release-notes.html - https://maven.apache.org/docs/3.9.6/release-notes.html ### Does this PR introduce _any_ user-facing change? No because this is a build time change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46069 from dongjoon-hyun/SPARK-46335. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/appveyor-install-dependencies.ps1 | 2 +- docs/building-spark.md| 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/appveyor-install-dependencies.ps1 b/dev/appveyor-install-dependencies.ps1 index 3737382eb86e..792a9aa4e979 100644 --- a/dev/appveyor-install-dependencies.ps1 +++ b/dev/appveyor-install-dependencies.ps1 @@ -81,7 +81,7 @@ if (!(Test-Path $tools)) { # == Maven # Push-Location $tools # -# $mavenVer = "3.8.8" +# $mavenVer = "3.9.6" # Start-FileDownload "https://archive.apache.org/dist/maven/maven-3/$mavenVer/binaries/apache-maven-$mavenVer-bin.zip; "maven.zip" # # # extract diff --git a/docs/building-spark.md b/docs/building-spark.md index 33d253a49dbf..4f626b4ff58c 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -27,7 +27,7 @@ license: | ## Apache Maven The Maven-based build is the build of reference for Apache Spark. -Building Spark using Maven requires Maven 3.8.8 and Java 8/11/17. +Building Spark using Maven requires Maven 3.9.6 and Java 8/11/17. Spark requires Scala 2.12/2.13; support for Scala 2.11 was removed in Spark 3.0.0. ### Setting up Maven's Memory Usage diff --git a/pom.xml b/pom.xml index 34cbefbeb3f7..6bb764e0c28c 100644 --- a/pom.xml +++ b/pom.xml @@ -115,7 +115,7 @@ 1.8 ${java.version} ${java.version} -3.8.8 +3.9.6 3.1.0 spark 9.5 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3ff339362b75 [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 3ff339362b75 is described below commit 3ff339362b759d5aef46a7668cbdca1f72ba289e Author: Dongjoon Hyun AuthorDate: Mon Apr 15 20:37:43 2024 -0700 [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 ### What changes were proposed in this pull request? This PR aims to upgrade `slf4j` to 2.0.13. ### Why are the changes needed? To bring the following bug fix, - https://www.slf4j.org/news.html#2.0.13 - https://github.com/qos-ch/slf4j/issues/409 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Cis. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46067 from dongjoon-hyun/SPARK-47861. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++--- pom.xml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 7f48a4327dba..466e8d09d89e 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -128,7 +128,7 @@ javax.servlet-api/4.0.1//javax.servlet-api-4.0.1.jar javolution/5.5.1//javolution-5.5.1.jar jaxb-api/2.2.11//jaxb-api-2.2.11.jar jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar -jcl-over-slf4j/2.0.12//jcl-over-slf4j-2.0.12.jar +jcl-over-slf4j/2.0.13//jcl-over-slf4j-2.0.13.jar jdo-api/3.0.1//jdo-api-3.0.1.jar jdom2/2.0.6//jdom2-2.0.6.jar jersey-client/3.0.12//jersey-client-3.0.12.jar @@ -154,7 +154,7 @@ json4s-jackson_2.13/4.0.7//json4s-jackson_2.13-4.0.7.jar json4s-scalap_2.13/4.0.7//json4s-scalap_2.13-4.0.7.jar jsr305/3.0.0//jsr305-3.0.0.jar jta/1.1//jta-1.1.jar -jul-to-slf4j/2.0.12//jul-to-slf4j-2.0.12.jar +jul-to-slf4j/2.0.13//jul-to-slf4j-2.0.13.jar kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar kubernetes-client-api/6.12.0//kubernetes-client-api-6.12.0.jar kubernetes-client/6.12.0//kubernetes-client-6.12.0.jar @@ -255,7 +255,7 @@ scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar scala-reflect/2.13.13//scala-reflect-2.13.13.jar scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar -slf4j-api/2.0.12//slf4j-api-2.0.12.jar +slf4j-api/2.0.13//slf4j-api-2.0.13.jar snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar snakeyaml/2.2//snakeyaml-2.2.jar snappy-java/1.1.10.5//snappy-java-1.1.10.5.jar diff --git a/pom.xml b/pom.xml index fef2601c24db..99b238aac1dc 100644 --- a/pom.xml +++ b/pom.xml @@ -119,7 +119,7 @@ 3.1.0 spark 9.6 -2.0.12 +2.0.13 2.22.1 3.4.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-43394][BUILD] Upgrade maven to 3.8.8
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 1ee3f4e6bd79 [SPARK-43394][BUILD] Upgrade maven to 3.8.8 1ee3f4e6bd79 is described below commit 1ee3f4e6bd7974c238556c538e90dda10dc2e2b7 Author: Cheng Pan AuthorDate: Sun May 7 08:24:12 2023 -0500 [SPARK-43394][BUILD] Upgrade maven to 3.8.8 Upgrade Maven from 3.8.7 to 3.8.8. Maven 3.8.8 is the latest patched version of 3.8.x https://maven.apache.org/docs/3.8.8/release-notes.html No Pass GA. Closes #41073 from pan3793/SPARK-43394. Authored-by: Cheng Pan Signed-off-by: Sean Owen (cherry picked from commit 04ef3d5d0f2bfebce8dd3b48b9861a2aa5ba1c3a) Signed-off-by: Dongjoon Hyun --- dev/appveyor-install-dependencies.ps1 | 2 +- docs/building-spark.md| 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/appveyor-install-dependencies.ps1 b/dev/appveyor-install-dependencies.ps1 index a369e9285a0f..88090149f5c0 100644 --- a/dev/appveyor-install-dependencies.ps1 +++ b/dev/appveyor-install-dependencies.ps1 @@ -81,7 +81,7 @@ if (!(Test-Path $tools)) { # == Maven # Push-Location $tools # -# $mavenVer = "3.8.6" +# $mavenVer = "3.8.8" # Start-FileDownload "https://archive.apache.org/dist/maven/maven-3/$mavenVer/binaries/apache-maven-$mavenVer-bin.zip; "maven.zip" # # # extract diff --git a/docs/building-spark.md b/docs/building-spark.md index be1c9062c5e2..5704da9cec85 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -27,7 +27,7 @@ license: | ## Apache Maven The Maven-based build is the build of reference for Apache Spark. -Building Spark using Maven requires Maven 3.8.6 and Java 8. +Building Spark using Maven requires Maven 3.8.8 and Java 8/11/17. Spark requires Scala 2.12/2.13; support for Scala 2.11 was removed in Spark 3.0.0. ### Setting up Maven's Memory Usage diff --git a/pom.xml b/pom.xml index 3c8d0260c4a8..282a46910902 100644 --- a/pom.xml +++ b/pom.xml @@ -113,7 +113,7 @@ 1.8 ${java.version} ${java.version} -3.8.6 +3.8.8 1.6.0 spark 2.0.6 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated (b8e2498007a0 -> 3b3903dda363)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git from b8e2498007a0 [SPARK-47318][CORE][3.5] Adds HKDF round to AuthEngine key derivation to follow standard KEX practices add 3b3903dda363 [SPARK-47828][CONNECT][PYTHON][3.5] DataFrameWriterV2.overwrite fails with invalid plan No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/plan.py | 8 python/pyspark/sql/tests/test_readwriter.py | 7 ++- 2 files changed, 10 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (becbca6752a5 -> 6d1b3668db42)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from becbca6752a5 [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.0 add 6d1b3668db42 [SPARK-47855][CONNECT] Add `spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list No new revisions were added by this update. Summary of changes: .../apache/spark/sql/connect/service/SparkConnectConfigHandler.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (61264f77fd68 -> becbca6752a5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 61264f77fd68 [SPARK-47603][KUBERNETES][YARN] Resource managers: Migrate logWarn with variables to structured logging framework add becbca6752a5 [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.0 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +-- pom.xml | 2 +- 2 files changed, 26 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (f3a6ca9e2c47 -> ba673d74973a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f3a6ca9e2c47 [SPARK-47357][SQL] Add support for Upper, Lower, InitCap (all collations) add ba673d74973a [SPARK-47856][SQL] Document Mapping Spark SQL Data Types from Oracle and add tests No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/OracleIntegrationSuite.scala| 47 +++ docs/sql-data-sources-jdbc.md | 144 + 2 files changed, 191 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: Fix network-commont module version to 3.4.4-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 9993c39ef7a1 Fix network-commont module version to 3.4.4-SNAPSHOT 9993c39ef7a1 is described below commit 9993c39ef7a104056b143f8e12c824d6ca68ab60 Author: Dongjoon Hyun AuthorDate: Sun Apr 14 21:44:22 2024 -0700 Fix network-commont module version to 3.4.4-SNAPSHOT --- common/network-common/pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 8a1fe5781ba4..da85893ed3b6 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.3 +3.4.4-SNAPSHOT ../../pom.xml - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r68519 - in /dev/spark/v3.4.3-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.3.1/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: dongjoon Date: Mon Apr 15 02:33:02 2024 New Revision: 68519 Log: Apache Spark v3.4.3-rc2 docs [This commit notification would consist of 2987 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r68518 - /dev/spark/v3.4.3-rc2-bin/
Author: dongjoon Date: Mon Apr 15 01:30:44 2024 New Revision: 68518 Log: Apache Spark v3.4.3-rc2 Added: dev/spark/v3.4.3-rc2-bin/ dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz (with props) dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.asc dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.sha512 dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz (with props) dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.asc dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.sha512 dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3-scala2.13.tgz (with props) dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3-scala2.13.tgz.asc dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3-scala2.13.tgz.sha512 dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3.tgz (with props) dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3.tgz.asc dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3.tgz.sha512 dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-without-hadoop.tgz (with props) dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-without-hadoop.tgz.asc dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-without-hadoop.tgz.sha512 dev/spark/v3.4.3-rc2-bin/spark-3.4.3.tgz (with props) dev/spark/v3.4.3-rc2-bin/spark-3.4.3.tgz.asc dev/spark/v3.4.3-rc2-bin/spark-3.4.3.tgz.sha512 Added: dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.asc == --- dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.asc (added) +++ dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.asc Mon Apr 15 01:30:44 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmYcgt0UHGRvbmdqb29u +QGFwYWNoZS5vcmcACgkQ7aAM6DTw/FwVpQ//feuIM/HSzfE31Blc43Zc05sWRwZ2 +FZeiQGQ6dRbJpjKjLtKMsvlORov9Vx6225VX7bpBqyZ9gQDB8Hq1uoFPiQwagbBn +qFCDh3agkEVxDZHEYjIBNRW5IVR89rFCCLR+YafKnN+alfCaScmGfAhS2JQYvsfM +733xqFyxduPqPUVC7uJfi7qLEqrn8QV13duGzWmIEhAdl03/14UwWektNfQaSfPB +cwv26dnQdUBGoqIEW9eJIM47+Plj1WYMNZtjB60bid5cilm9NjLB6GaHpzijSTHX +Kpssu22OQPzG7d2D2D3EMvpHiAJC1oUIXnzzJiApOFg9dpcDhtH6Jp3J53UuMfBs +pX/Yt/0n8VlZoF6DwREtLi3L5AeJt+wrlQQUSwAUNU7bQrM5mtQmuzc9u/lUfcPQ +74860MGPWPx9+N+5NgSPop9UgP6fOSm53jFXIBJzedHLHhakSTu7+2mHEnpABwTE +02LuAzZVwJ0N/iH0rwIKzNiikydtQyO7nTCUruGuMLcRFM5wnn3DNeSqbw/zRNAl +Fabwq/x1dnA4ryoCV20s7ug0iVBsXN+eQzEegpshrUHZLFma4z7+iieX+xpuSu22 +ZWbbR0sh433tndVREpPg8K2oSsaASxkE0yUlgrp97uHDx7WAixReZCQ40szXJEC4 +MGp+TprPL1Ib4OI= +=w+kM +-END PGP SIGNATURE- Added: dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.sha512 == --- dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.sha512 (added) +++ dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.sha512 Mon Apr 15 01:30:44 2024 @@ -0,0 +1 @@ +2bf5b5b574c916bc74cad122f22c33afec129e56fe6672bb0eaeff7b0218853e1e426e554119b2b0b94c527f05ae041057efbfba53a8916a1c7cc01964366d7b SparkR_3.4.3.tar.gz Added: dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.asc == --- dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.asc (added) +++ dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.asc Mon Apr 15 01:30:44 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmYcgt8UHGRvbmdqb29u +QGFwYWNoZS5vcmcACgkQ7aAM6DTw/Fynzg//TTKfsQ/w2lI1IqYLCJi8FBQJM3vx +XfzGDq+gkyBSc1ohbNn/nMi/OryOXui5o74d1xmiyWz36M97DRXBaI+ldTFi9lgy +DCECCDrNU0RcWkHtXCaP0EahN4pBK+82ftD7KrkZAILdvxZpSU2XIesBjs5lrSpn +NFvwYvWg4COc+tMxvFOybAzqIDhe1geoLeEgizbJcC7PyACH9cQccazco1xoEi6K +d+pMrBSGeV3ReiML7X6/fFXOwqe1P95NrdRLDdl0irow/p08Tbf8YW5b+Abo0j/E +37SEh8veYoX0otOFrc5K/Z4sNh5OlLuzXnhOG03bCXpJ71imZGaJUPW286Tbnl8p +fecG/aZ8Avb0yCWMIzeoffd00ObpFulU8zNQztdGJzQnJR12K1tefNPLA6Al0KE4 +7NljgNDfJL+WGhoip6rYLol7WK1RgGFHPYqcVINz6ZUNChAqdCgrSefCdb//Kavv +Qkq08Q3QqlcHTGJb2hRmvwMuVYTqyFsRu83/EDYdVNdEZ0lWR5P79z+N+Is6SdYc +Z/zcnPD83cNNCahyY97VkcyNBcZvx4maa/4AXCzBlGkebc4Yyymt3sft0/QWIM39 +FQz8mqciCQKfqIU4HI5yogxadmmFd5tELyBhbQz5mbgvFhHHDzzJpuDPIA6jZUQc +C5OZ9KHhC/pF1dg= +=xtzn +-END PGP SIGNATURE- Added: dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.sha512 == --- dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.sha512 (added) +++ dev/spark
(spark) 01/01: Preparing development version 3.4.4-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git commit d8c01554e5b88d0739343f13fe1fddd17892b8bc Author: Dongjoon Hyun AuthorDate: Mon Apr 15 00:21:16 2024 + Preparing development version 3.4.4-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 42 files changed, 44 insertions(+), 44 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 8f6d8f1b3b6e..6d2bd4eb9759 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.3 +Version: 3.4.4 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index 7df44b0eb82c..4f5d6213bca5 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.3 +3.4.4-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 2b6f51089248..161d12d8cd05 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.3 +3.4.4-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 4ab02df6003c..f772d3d080ed 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.3 +3.4.4-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 5b256c629847..eda2c13558ae 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.3 +3.4.4-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index e74655b629df..4f9d962818d2 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.3 +3.4.4-SNAPSHOT ../../pom.xml diff --git a/common/tags/pom.xml b/common/tags/pom.xml index d5213c22fd4c..a7a2f2d27adb 100644 --- a/common/tags/pom.xml +++ b/common/tags/pom.xml @@ -22,7
(spark) branch branch-3.4 updated (df3e8e4d2a3a -> d8c01554e5b8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from df3e8e4d2a3a Preparing development version 3.4.4-SNAPSHOT add 1eb558c3a6fb Preparing Spark release v3.4.3-rc2 new d8c01554e5b8 Preparing development version 3.4.4-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: common/network-common/pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) 01/01: Preparing Spark release v3.4.3-rc2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to tag v3.4.3-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git commit 1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f Author: Dongjoon Hyun AuthorDate: Mon Apr 15 00:21:11 2024 + Preparing Spark release v3.4.3-rc2 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 6d2bd4eb9759..8f6d8f1b3b6e 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.4 +Version: 3.4.3 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index 4f5d6213bca5..7df44b0eb82c 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.4-SNAPSHOT +3.4.3 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 161d12d8cd05..2b6f51089248 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.4-SNAPSHOT +3.4.3 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index da85893ed3b6..8a1fe5781ba4 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.4-SNAPSHOT +3.4.3 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index f772d3d080ed..4ab02df6003c 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.4-SNAPSHOT +3.4.3 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index eda2c13558ae..5b256c629847 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.4-SNAPSHOT +3.4.3 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 4f9d96