(spark) branch master updated (b8e7d99d417a -> 11247d804cd3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b8e7d99d417a [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning add 11247d804cd3 [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3 No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b8e7d99d417a [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning b8e7d99d417a is described below commit b8e7d99d417ab4bcc3e69d11a0eee5864cb083e3 Author: Anish Shrigondekar AuthorDate: Wed Mar 20 15:11:51 2024 -0700 [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning ### What changes were proposed in this pull request? Fix RocksDB Logger constructor use to avoid deprecation warning ### Why are the changes needed? With the latest RocksDB upgrade, the Logger constructor used was deprecated which was throwing a compiler warning. ``` [warn] val dbLogger = new Logger(dbOptions) { [warn]^ [warn] one warning found [warn] two warnings found [info] compiling 36 Scala sources and 16 Java sources to /Users/anish.shrigondekar/spark/spark/sql/core/target/scala-2.13/classes ... [warn] -target is deprecated: Use -release instead to compile against the correct platform API. [warn] Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation [warn] /Users/anish.shrigondekar/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:851:24: constructor Logger in class Logger is deprecated [warn] Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.execution.streaming.state.RocksDB.createLogger.dbLogger, origin=org.rocksdb.Logger. ``` Updated to use the new recommendation as mentioned here - https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Logger.html Recommendation: ``` [Logger](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/Logger.html#Logger-org.rocksdb.DBOptions-)([DBOptions](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/DBOptions.html) dboptions) Deprecated. Use [Logger(InfoLogLevel)](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/Logger.html#Logger-org.rocksdb.InfoLogLevel-) instead, e.g. new Logger(dbOptions.infoLogLevel()). ``` After the fix, the warning is not seen. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #45616 from anishshri-db/task/SPARK-47490. Authored-by: Anish Shrigondekar Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala index 950baba9031b..8fad5ce7bd6a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala @@ -848,7 +848,7 @@ class RocksDB( /** Create a native RocksDB logger that forwards native logs to log4j with correct log levels. */ private def createLogger(): Logger = { -val dbLogger = new Logger(dbOptions) { +val dbLogger = new Logger(dbOptions.infoLogLevel()) { override def log(infoLogLevel: InfoLogLevel, logMsg: String) = { // Map DB log level to log4j levels // Warn is mapped to info because RocksDB warn is too verbose - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` method
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f66274e92d1c [SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` method f66274e92d1c is described below commit f66274e92d1ce6e65fecd45711da59eb08a9d296 Author: yangjie01 AuthorDate: Wed Mar 20 15:10:49 2024 -0700 [SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` method ### What changes were proposed in this pull request? The private method `getString` in `ArrowDeserializers` is no longer used after SPARK-9 | https://github.com/apache/spark/pull/42076, this pr removes it. ### Why are the changes needed? Code clean up. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #45610 from LuciferYang/SPARK-47486. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- .../spark/sql/connect/client/arrow/ArrowDeserializer.scala | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala index ac9619487f02..eaf2927863ec 100644 --- a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala +++ b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala @@ -29,10 +29,9 @@ import scala.collection.mutable import scala.reflect.ClassTag import org.apache.arrow.memory.BufferAllocator -import org.apache.arrow.vector.{FieldVector, VarCharVector, VectorSchemaRoot} +import org.apache.arrow.vector.{FieldVector, VectorSchemaRoot} import org.apache.arrow.vector.complex.{ListVector, MapVector, StructVector} import org.apache.arrow.vector.ipc.ArrowReader -import org.apache.arrow.vector.util.Text import org.apache.spark.sql.catalyst.ScalaReflection import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder @@ -468,16 +467,6 @@ object ArrowDeserializers { private def isTuple(cls: Class[_]): Boolean = cls.getName.startsWith("scala.Tuple") - private def getString(v: VarCharVector, i: Int): String = { -// This is currently a bit heavy on allocations: -// - byte array created in VarCharVector.get -// - CharBuffer created CharSetEncoder -// - char array in String -// By using direct buffers and reusing the char buffer -// we could get rid of the first two allocations. -Text.decode(v.get(i)) - } - private def loadListIntoBuilder( v: ListVector, i: Int, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 49b4c3bc9c09 [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0 49b4c3bc9c09 is described below commit 49b4c3bc9c09325de941dfaf41e4fd3a4a4c345f Author: Dongjoon Hyun AuthorDate: Wed Mar 20 10:37:51 2024 -0700 [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache Hadoop 3.4.0 for Apache Spark 4.0.0. ### Why are the changes needed? To bring the new features like the following - https://hadoop.apache.org/docs/r3.4.0 - [HADOOP-18995](https://issues.apache.org/jira/browse/HADOOP-18995) Upgrade AWS SDK version to 2.21.33 for `S3 Express One Zone` - [HADOOP-18328](https://issues.apache.org/jira/browse/HADOOP-18328) Supports `S3 on Outposts` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45583 from dongjoon-hyun/SPARK-45393. Lead-authored-by: Dongjoon Hyun Co-authored-by: YangJie Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 27 -- pom.xml| 2 +- .../spark/deploy/yarn/YarnClusterSuite.scala | 3 ++- 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 86da61d89149..903c7a245af3 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -9,7 +9,7 @@ algebra_2.13/2.8.0//algebra_2.13-2.8.0.jar aliyun-java-sdk-core/4.5.10//aliyun-java-sdk-core-4.5.10.jar aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar -aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar +aliyun-sdk-oss/3.13.2//aliyun-sdk-oss-3.13.2.jar annotations/17.0.0//annotations-17.0.0.jar antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar @@ -24,7 +24,6 @@ audience-annotations/0.12.0//audience-annotations-0.12.0.jar avro-ipc/1.11.3//avro-ipc-1.11.3.jar avro-mapred/1.11.3//avro-mapred-1.11.3.jar avro/1.11.3//avro-1.11.3.jar -aws-java-sdk-bundle/1.12.367//aws-java-sdk-bundle-1.12.367.jar azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar azure-storage/7.0.1//azure-storage-7.0.1.jar @@ -32,6 +31,7 @@ blas/3.0.3//blas-3.0.3.jar bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar +bundle/2.23.19//bundle-2.23.19.jar cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar chill-java/0.10.0//chill-java-0.10.0.jar chill_2.13/0.10.0//chill_2.13-0.10.0.jar @@ -65,21 +65,23 @@ derbytools/10.16.1.1//derbytools-10.16.1.1.jar dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar +esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar gcs-connector/hadoop3-2.2.20/shaded/gcs-connector-hadoop3-2.2.20-shaded.jar gmetric4j/1.0.10//gmetric4j-1.0.10.jar gson/2.2.4//gson-2.2.4.jar guava/14.0.1//guava-14.0.1.jar -hadoop-aliyun/3.3.6//hadoop-aliyun-3.3.6.jar -hadoop-annotations/3.3.6//hadoop-annotations-3.3.6.jar -hadoop-aws/3.3.6//hadoop-aws-3.3.6.jar -hadoop-azure-datalake/3.3.6//hadoop-azure-datalake-3.3.6.jar -hadoop-azure/3.3.6//hadoop-azure-3.3.6.jar -hadoop-client-api/3.3.6//hadoop-client-api-3.3.6.jar -hadoop-client-runtime/3.3.6//hadoop-client-runtime-3.3.6.jar -hadoop-cloud-storage/3.3.6//hadoop-cloud-storage-3.3.6.jar -hadoop-shaded-guava/1.1.1//hadoop-shaded-guava-1.1.1.jar -hadoop-yarn-server-web-proxy/3.3.6//hadoop-yarn-server-web-proxy-3.3.6.jar +hadoop-aliyun/3.4.0//hadoop-aliyun-3.4.0.jar +hadoop-annotations/3.4.0//hadoop-annotations-3.4.0.jar +hadoop-aws/3.4.0//hadoop-aws-3.4.0.jar +hadoop-azure-datalake/3.4.0//hadoop-azure-datalake-3.4.0.jar +hadoop-azure/3.4.0//hadoop-azure-3.4.0.jar +hadoop-client-api/3.4.0//hadoop-client-api-3.4.0.jar +hadoop-client-runtime/3.4.0//hadoop-client-runtime-3.4.0.jar +hadoop-cloud-storage/3.4.0//hadoop-cloud-storage-3.4.0.jar +hadoop-huaweicloud/3.4.0//hadoop-huaweicloud-3.4.0.jar +hadoop-shaded-guava/1.2.0//hadoop-shaded-guava-1.2.0.jar +hadoop-yarn-server-web-proxy/3.4.0//hadoop-yarn-server-web-proxy-3.4.0.jar hive-beeline/2.3.9//hive-beeline-2.3.9.jar hive-cli/2.3.9//hive
(spark) branch master updated: [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a34c8ceb19bd [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect a34c8ceb19bd is described below commit a34c8ceb19bd1c1548a60bb144d1c587a2861cd8 Author: Kent Yao AuthorDate: Wed Mar 20 09:31:26 2024 -0700 [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect ### What changes were proposed in this pull request? Align mappings of other unsigned numeric types with TINYINT in MySQLDialect. TINYINT is mapping to ByteType and TINYINT UNSIGNED is mapping to ShortType. In this PR, we - map SMALLINT to ShortType, SMALLINT UNSIGNED to IntegerType. W/o this, both of them are mapping to IntegerType - map MEDIUMINT UNSIGNED to IntegerType, and MEDIUMINT is AS-IS. W/o this, MEDIUMINT UNSIGNED uses LongType Other unsigned/signed types remain unchanged and only improve the test coverage. ### Why are the changes needed? Consistency and efficiency while reading MySQL numeric values ### Does this PR introduce _any_ user-facing change? yes, the mappings described the 1st section. ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45588 from yaooqinn/SPARK-47462. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 39 ++ .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 10 ++ 2 files changed, 42 insertions(+), 7 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index 3d65b4f305b3..5b2214f2efd6 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -53,11 +53,19 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits BIT(10), " + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci DECIMAL(40,20), flt FLOAT, " - + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate() + + "dbl DOUBLE, tiny TINYINT)").executeUpdate() conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', " + "17, 7, 123456789, 123456789012345, 123456789012345.123456789012345, " - + "42.75, 1.0002, -128, 255)").executeUpdate() + + "42.75, 1.0002, -128)").executeUpdate() + +conn.prepareStatement("CREATE TABLE unsigned_numbers (" + + "tiny TINYINT UNSIGNED, small SMALLINT UNSIGNED, med MEDIUMINT UNSIGNED," + + "nor INT UNSIGNED, big BIGINT UNSIGNED, deci DECIMAL(40,20) UNSIGNED," + + "dbl DOUBLE UNSIGNED)").executeUpdate() + +conn.prepareStatement("INSERT INTO unsigned_numbers VALUES (255, 65535, 16777215, 4294967295," + + "9223372036854775808, 123456789012345.123456789012345, 1.0002)").executeUpdate() conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts TIMESTAMP, " + "yr YEAR)").executeUpdate() @@ -87,10 +95,10 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { val rows = df.collect() assert(rows.length == 1) val types = rows(0).toSeq.map(x => x.getClass.toString) -assert(types.length == 11) +assert(types.length == 10) assert(types(0).equals("class java.lang.Boolean")) assert(types(1).equals("class java.lang.Long")) -assert(types(2).equals("class java.lang.Integer")) +assert(types(2).equals("class java.lang.Short")) assert(types(3).equals("class java.lang.Integer")) assert(types(4).equals("class java.lang.Integer")) assert(types(5).equals("class java.lang.Long")) @@ -98,10 +106,9 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(types(7).equals("class java.lang.Double")) assert(types(8).equals("class java.lang.Double")) assert(types(9).equals("class java.lang.Byte")) -assert(types(10).equals("class java.lang.Short")) assert(rows(0).getBoolean(0) == false) assert(rows
(spark) branch branch-3.5 updated: [SPARK-47481][INFRA][3.5] Fix Python linter
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 9baf82b1c97a [SPARK-47481][INFRA][3.5] Fix Python linter 9baf82b1c97a is described below commit 9baf82b1c97a792a3733dedccf1c03737b592bbd Author: panbingkun AuthorDate: Wed Mar 20 07:19:29 2024 -0700 [SPARK-47481][INFRA][3.5] Fix Python linter ### What changes were proposed in this pull request? The pr aims to fix `python linter issue` on `branch-3.5` through pinning `matplotlib==3.7.2` ### Why are the changes needed? Fix `python linter issue` on `branch-3.5`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45550 from panbingkun/branch-3.5_scheduled_job. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index d3fcd7ab3622..f0b88666c040 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -65,10 +65,10 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" RUN pypy3 -m pip install numpy 'pandas<=2.0.3' scipy coverage matplotlib -RUN python3.9 -m pip install numpy 'pyarrow==12.0.1' 'pandas<=2.0.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' +RUN python3.9 -m pip install 'numpy==1.25.1' 'pyarrow==12.0.1' 'pandas<=2.0.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage 'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' # Add Python deps for Spark Connect. RUN python3.9 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 'protobuf==3.20.3' 'googleapis-common-protos==1.56.4' # Add torch as a testing dependency for TorchDistributor -RUN python3.9 -m pip install torch torchvision torcheval +RUN python3.9 -m pip install 'torch==2.0.1' 'torchvision==0.15.2' torcheval - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter failure
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 4de8000f21a4 [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter failure 4de8000f21a4 is described below commit 4de8000f21a48796d30af37bc57269395792a254 Author: panbingkun AuthorDate: Wed Mar 20 07:15:32 2024 -0700 [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter failure ### What changes were proposed in this pull request? The pr aims to fix `python linter issue` on branch-3.4 through pinning `matplotlib<3.3.0` ### Why are the changes needed? - Through this PR https://github.com/apache/spark/pull/45600, we found that the version of `matplotlib` in our Docker image was `3.8.2`, which clearly did not meet the original requirements for `branch-3.4`. https://github.com/panbingkun/spark/actions/runs/8354370179/job/22869580038 https://github.com/apache/spark/assets/15246973/dd425bfb-ce5f-4a99-a487-a462d6e9";> https://github.com/apache/spark/blob/branch-3.4/dev/requirements.txt#L12 https://github.com/apache/spark/assets/15246973/70485648-b886-4218-bb21-c41a85d5eecf";> - Fix as follows: https://github.com/apache/spark/assets/15246973/db31d8fb-0b6c-4925-95e1-0ca0247bb9f5";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45608 from panbingkun/branch_3.4_pin_matplotlib. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 68d27052437b..5ebd10339be9 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -37,6 +37,7 @@ RUN add-apt-repository ppa:pypy/ppa RUN apt update RUN $APT_INSTALL gfortran libopenblas-dev liblapack-dev RUN $APT_INSTALL build-essential +RUN $APT_INSTALL python3-matplotlib RUN mkdir -p /usr/local/pypy/pypy3.7 && \ curl -sqL https://downloads.python.org/pypy/pypy3.7-v7.3.7-linux64.tar.bz2 | tar xjf - -C /usr/local/pypy/pypy3.7 --strip-components=1 && \ @@ -64,8 +65,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht # See more in SPARK-39735 ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" -RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib -RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' +RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage 'matplotlib<3.3.0' +RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 'matplotlib<3.3.0' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' # Add Python deps for Spark Connect. RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos grpcio-status - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new d25f49a14733 [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` d25f49a14733 is described below commit d25f49a14733c5a0e872498cab40a30a5ebc28b4 Author: Dongjoon Hyun AuthorDate: Tue Mar 19 20:53:45 2024 -0700 [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` ### What changes were proposed in this pull request? This PR aims to pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` to recover the following test failure. ### Why are the changes needed? `numpy==1.23.5` was the version of the last successful run. - https://github.com/apache/spark/actions/runs/8276453417/job/22725387782 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? Closes #45595 from dongjoon-hyun/pin-numpy. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 93d8793826ff..68d27052437b 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib -RUN python3.9 -m pip install numpy 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' +RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' # Add Python deps for Spark Connect. RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos grpcio-status - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (bc378f4ff5e2 -> 61d7b0f24fc9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from bc378f4ff5e2 [SPARK-47330][SQL][TESTS] XML: Added XmlExpressionsSuite add 61d7b0f24fc9 [SPARK-47470][SQL][TESTS] Ignore `IntentionallyFaultyConnectionProvider` error in `CliSuite` No new revisions were added by this update. Summary of changes: .../test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c32d27850e2e [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven c32d27850e2e is described below commit c32d27850e2ea5f8cb36099ab8453b09f4c70861 Author: Dongjoon Hyun AuthorDate: Tue Mar 19 17:52:38 2024 -0700 [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven ### What changes were proposed in this pull request? This PR aims to exclude `logback` from SBT dependency like Maven to fix the following SBT issue. ``` [info] stderr> SLF4J: Class path contains multiple SLF4J bindings. [info] stderr> SLF4J: Found binding in [jar:file:/home/runner/work/spark/spark/assembly/target/scala-2.13/jars/logback-classic-1.2.13.jar!/org/slf4j/impl/StaticLoggerBinder.class] [info] stderr> SLF4J: Found binding in [jar:file:/home/runner/.cache/coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar!/org/slf4j/impl/StaticLoggerBinder.class] [info] stderr> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. [info] stderr> SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder] ``` ### Why are the changes needed? **Maven** ``` $ build/mvn dependency:tree --pl core | grep logback Using `mvn` from path: /opt/homebrew/bin/mvn Using SPARK_LOCAL_IP=localhost ``` **SBT (BEFORE)** ``` $ build/sbt "core/test:dependencyTree" | grep logback Using SPARK_LOCAL_IP=localhost [info] | +-ch.qos.logback:logback-classic:1.2.13 [info] | | +-ch.qos.logback:logback-core:1.2.13 [info] | +-ch.qos.logback:logback-core:1.2.13 [info] | | +-ch.qos.logback:logback-classic:1.2.13 [info] | | | +-ch.qos.logback:logback-core:1.2.13 [info] | | +-ch.qos.logback:logback-core:1.2.13 [info] | +-ch.qos.logback:logback-classic:1.2.13 [info] | | +-ch.qos.logback:logback-core:1.2.13 [info] | +-ch.qos.logback:logback-core:1.2.13 ``` **SBT (AFTER)** ``` $ build/sbt "core/test:dependencyTree" | grep logback Using SPARK_LOCAL_IP=localhost ``` ### Does this PR introduce _any_ user-facing change? No. This only fixes developer and CI issues. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45594 from dongjoon-hyun/SPARK-47468. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- project/SparkBuild.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index b7b9589568e1..3d89af2aa7b4 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -1078,6 +1078,7 @@ object ExcludedDependencies { // purpose only. Here we exclude them from the whole project scope and add them w/ yarn only. excludeDependencies ++= Seq( ExclusionRule(organization = "com.sun.jersey"), + ExclusionRule(organization = "ch.qos.logback"), ExclusionRule("javax.ws.rs", "jsr311-api")) ) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 32ee2d7936a5 [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant` 32ee2d7936a5 is described below commit 32ee2d7936a50a653e8ea599d622fbc550fa5eac Author: panbingkun AuthorDate: Tue Mar 19 16:27:15 2024 -0700 [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant` ### What changes were proposed in this pull request? The pr aims to update `labeler.yml` for module `common/sketch` and `common/variant`. ### Why are the changes needed? Currently, the above modules are not classified in the file `labeler.yml`, and the GitHub action label cannot automatically tag the submitted PR. ### Does this PR introduce _any_ user-facing change? Yes, only for dev. ### How was this patch tested? Manually test: after this PR is merged, continue to observe. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45590 from panbingkun/SPARK-47464. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- .github/labeler.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/labeler.yml b/.github/labeler.yml index 7d24390f2968..104eac99ec4d 100644 --- a/.github/labeler.yml +++ b/.github/labeler.yml @@ -101,6 +101,8 @@ SQL: ] - any-glob-to-any-file: [ 'common/unsafe/**/*', + 'common/sketch/**/*', + 'common/variant/**/*', 'bin/spark-sql*', 'bin/beeline*', 'sbin/*thriftserver*.sh', - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (90560dce85b0 -> db531c6ee719)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 90560dce85b0 [SPARK-47458][CORE] Fix the problem with calculating the maximum concurrent tasks for the barrier stage add db531c6ee719 [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala | 4 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala | 4 +--- 2 files changed, 1 insertion(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (b6a836946311 -> a6bffcc3e5f0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b6a836946311 [SPARK-47454][PYTHON][CONNECT][TESTS] Split `pyspark.sql.tests.test_dataframe` add a6bffcc3e5f0 [SPARK-47457][SQL] Fix `IsolatedClientLoader.supportsHadoopShadedClient` to handle Hadoop 3.4+ No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 2 ++ .../org/apache/spark/sql/hive/client/HadoopVersionInfoSuite.scala | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ef94f7094989 [SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile` ef94f7094989 is described below commit ef94f709498974cb31e805541e0803270cd5c39e Author: Dongjoon Hyun AuthorDate: Mon Mar 18 23:15:32 2024 -0700 [SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile` ### What changes were proposed in this pull request? This PR aims to use `Ubuntu 22.04` in `dev/infra/Dockerfile` for Apache Spark 4.0.0. | Installed SW | BEFORE | AFTER | | - | | --- | | Ubuntu LTS | 20.04.5 | 22.04.4 | | Java| 17.0.10 | 17.0.10 | | PyPy 3.8| 3.8.16| 3.8.16 | | Python 3.9 | 3.9.5 | 3.9.18 | | Python 3.10 | 3.10.13 | 3.10.12 | | Python 3.11| 3.11.8| 3.11.8 | | Python 3.12 | 3.12.2| 3.12.2 | | R | 3.6.3 | 4.1.2 | ### Why are the changes needed? - Since Apache Spark 3.4.0, we use `Ubuntu 20.04` via SPARK-39522. - From Apache Spark 4.0.0, this PR aims to use `Ubuntu 22.04` mainly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45576 from dongjoon-hyun/SPARK-47452. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 52 +--- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 64adf33e6742..f17ee58c9d90 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -15,11 +15,11 @@ # limitations under the License. # -# Image for building and testing Spark branches. Based on Ubuntu 20.04. +# Image for building and testing Spark branches. Based on Ubuntu 22.04. # See also in https://hub.docker.com/_/ubuntu -FROM ubuntu:focal-20221019 +FROM ubuntu:jammy-20240227 -ENV FULL_REFRESH_DATE 20240117 +ENV FULL_REFRESH_DATE 20240318 ENV DEBIAN_FRONTEND noninteractive ENV DEBCONF_NONINTERACTIVE_SEEN true @@ -50,10 +50,8 @@ RUN apt-get update && apt-get install -y \ openjdk-17-jdk-headless \ pandoc \ pkg-config \ -python3-pip \ -python3-setuptools \ -python3.8 \ -python3.9 \ +python3.10 \ +python3-psutil \ qpdf \ r-base \ ruby \ @@ -64,10 +62,10 @@ RUN apt-get update && apt-get install -y \ && rm -rf /var/lib/apt/lists/* -RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> /etc/apt/sources.list +RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' >> /etc/apt/sources.list RUN gpg --keyserver hkps://keyserver.ubuntu.com --recv-key E298A3A825C0D65DFD57CBB651716619E084DAB9 RUN gpg -a --export E084DAB9 | apt-key add - -RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' +RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' # See more in SPARK-39959, roxygen2 < 7.2.1 RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', \ @@ -82,9 +80,6 @@ RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', \ ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" -RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9 - - RUN add-apt-repository ppa:pypy/ppa RUN mkdir -p /usr/local/pypy/pypy3.8 && \ curl -sqL https://downloads.python.org/pypy/pypy3.8-v7.3.11-linux64.tar.bz2 | tar xjf - -C /usr/local/pypy/pypy3.8 --strip-components=1 && \ @@ -98,41 +93,44 @@ ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.2.1 scipy plotly # Python deps for Spark Connect ARG CONNECT_PIP_PKGS="grpcio==1.62.0 grpcio-status==1.62.0 protobuf==4.25.1 googleapis-common-protos==1.56.4" -# Add torch as a testing dependency for TorchDistributor and DeepspeedTorchDistributor -RUN python3.9 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS && \ -python3.9 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu && \ -python3.9 -m pip install deepspeed torcheval && \ -python3.9 -m pip cache purge - -# Install Python 3.10 at the last stage to avoid breaking Python 3.9 -RUN add-apt-repository ppa:deadsnakes/ppa -RUN apt-get update && apt-get install -y \ -python3.10 python3.10-distut
(spark) branch master updated (5f48931fcdf7 -> 5e42ecc8163a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL docker image version to 8.3.0 add 5e42ecc8163a [SPARK-47456][SQL] Support ORC Brotli codec No new revisions were added by this update. Summary of changes: docs/sql-data-sources-orc.md | 2 +- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 4 ++-- .../spark/sql/execution/datasources/orc/OrcCompressionCodec.java | 3 ++- .../org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala| 3 ++- .../spark/sql/execution/datasources/FileSourceCodecSuite.scala | 5 - 5 files changed, 11 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (681b41f0808e -> 5f48931fcdf7)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 681b41f0808e [SPARK-47422][SQL] Support collated strings in array operations add 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL docker image version to 8.3.0 No new revisions were added by this update. Summary of changes: ...baseOnDocker.scala => MySQLDatabaseOnDocker.scala} | 17 +++-- .../apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 15 +++ .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 19 --- .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala | 19 --- 4 files changed, 18 insertions(+), 52 deletions(-) copy connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/{DB2DatabaseOnDocker.scala => MySQLDatabaseOnDocker.scala} (66%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (9f8147c2a8d2 -> e01ed0da22f2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9f8147c2a8d2 [SPARK-47329][SS][DOCS] Add note to persist dataframe while using foreachbatch and stateful streaming query to prevent state from being re-loaded in each batch add e01ed0da22f2 [SPARK-47345][SQL][TESTS][FOLLOW-UP] Rename JSON to XML within XmlFunctionsSuite No new revisions were added by this update. Summary of changes: sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (cb20fcae951d -> acf17fd67217)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cb20fcae951d [SPARK-47448][CORE] Enable `spark.shuffle.service.removeShuffle` by default add acf17fd67217 [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job No new revisions were added by this update. Summary of changes: .github/workflows/build_sparkr_window.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (51e8634a5883 -> cb20fcae951d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 51e8634a5883 [SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is the same add cb20fcae951d [SPARK-47448][CORE] Enable `spark.shuffle.service.removeShuffle` by default No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala | 1 + docs/configuration.md | 2 +- docs/core-migration-guide.md | 2 ++ 4 files changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a40940a0bc6d [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal` a40940a0bc6d is described below commit a40940a0bc6de58b5c56b8ad918f338c6e70572f Author: Dongjoon Hyun AuthorDate: Mon Mar 18 12:39:44 2024 -0700 [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal` ### What changes were proposed in this pull request? This PR aims to make `BlockManager` warn before invoking `removeBlockInternal` by switching the log position. To be clear, 1. For the case where `removeBlockInternal` succeeds, the log messages are identical before and after this PR. 2. For the case where `removeBlockInternal` fails, the user will see one additional warning message like the following which was hidden from the users before this PR. ``` logWarning(s"Putting block $blockId failed") ``` ### Why are the changes needed? When `Put` operation fails, Apache Spark currently tries `removeBlockInternal` first before logging. https://github.com/apache/spark/blob/ce93c9fd86715e2479552628398f6fc11e83b2af/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1554-L1567 On top of that, if `removeBlockInternal` fails consecutively, Spark shows the warning like the following and fails the job. ``` 24/03/18 18:40:46 WARN BlockManager: Putting block broadcast_0 failed due to exception java.nio.file.NoSuchFileException: /data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e. 24/03/18 18:40:46 WARN BlockManager: Block broadcast_0 was not removed normally. 24/03/18 18:40:46 INFO TaskSchedulerImpl: Cancelling stage 0 24/03/18 18:40:46 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled 24/03/18 18:40:46 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) failed in 0.264 s due to Job aborted due to stage failure: Task serialization failed: java.nio.file.NoSuchFileException: /data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e java.nio.file.NoSuchFileException: /data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e ``` It's misleading although they might share the same root cause. Since `Put` operation fails before the above failure, we had better switch WARN message to make it clear. ### Does this PR introduce _any_ user-facing change? No. This is a warning message change only. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45570 from dongjoon-hyun/SPARK-47446. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/storage/BlockManager.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala index 228ec5752e1b..89b3914e94af 100644 --- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala +++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala @@ -1561,8 +1561,8 @@ private[spark] class BlockManager( blockInfoManager.unlock(blockId) } } else { -removeBlockInternal(blockId, tellMaster = false) logWarning(s"Putting block $blockId failed") +removeBlockInternal(blockId, tellMaster = false) } res } catch { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47383][CORE] Support `spark.shutdown.timeout` config
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ce93c9fd8671 [SPARK-47383][CORE] Support `spark.shutdown.timeout` config ce93c9fd8671 is described below commit ce93c9fd86715e2479552628398f6fc11e83b2af Author: Rob Reeves AuthorDate: Mon Mar 18 10:36:38 2024 -0700 [SPARK-47383][CORE] Support `spark.shutdown.timeout` config ### What changes were proposed in this pull request? Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property. ### Why are the changes needed? Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners). This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it. ### Does this PR introduce _any_ user-facing change? Yes, a new config `spark.shutdown.timeout` is added. ### How was this patch tested? Manual testing in spark-shell. This behavior is not practical to write a unit test for. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45504 from robreeves/sc_shutdown_timeout. Authored-by: Rob Reeves Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/internal/config/package.scala| 10 ++ .../org/apache/spark/util/ShutdownHookManager.scala | 19 +-- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index aa240b5cc5b5..e72b9cb694eb 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -2683,4 +2683,14 @@ package object config { .version("4.0.0") .booleanConf .createWithDefault(false) + + private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS = +ConfigBuilder("spark.shutdown.timeout") + .internal() + .doc("Defines the timeout period to wait for all shutdown hooks to be executed. " + +"This must be passed as a system property argument in the Java options, for example " + +"spark.driver.extraJavaOptions=\"-Dspark.shutdown.timeout=60s\".") + .version("4.0.0") + .timeConf(TimeUnit.MILLISECONDS) + .createOptional } diff --git a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala index 4db268604a3e..c6cad9440168 100644 --- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala +++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala @@ -19,12 +19,16 @@ package org.apache.spark.util import java.io.File import java.util.PriorityQueue +import java.util.concurrent.TimeUnit import scala.util.Try import org.apache.hadoop.fs.FileSystem +import org.apache.spark.SparkConf import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.SPARK_SHUTDOWN_TIMEOUT_MS + /** * Various utility methods used by Spark. @@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager { val hookTask = new Runnable() { override def run(): Unit = runAll() } -org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook( - hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30) +val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30 +// The timeout property must be passed as a Java system property because this +// is initialized before Spark configurations are registered as system +// properties later in initialization. +val timeout = new SparkConf().get(SPARK_SHUTDOWN_TIMEOUT_MS) + +timeout.fold { + org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook( +hookTask, priority) +} { t => + org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook( +hookTask, priority, t, TimeUnit.MILLISECONDS) +} } def runAll(): Unit = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8bd42cbdb6bf [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561 8bd42cbdb6bf is described below commit 8bd42cbdb6bfa40aead94570b06e926f8e8aa9e1 Author: Kent Yao AuthorDate: Mon Mar 18 08:56:55 2024 -0700 [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561 ### What changes were proposed in this pull request? SPARK-45561 mapped java.sql.Types.TINYINT to ByteType in MySQL Dialect, which caused unsigned TINYINT overflow. As regardless of signed or unsigned types, the TINYINT is used for java.sql.Types. In this PR, we put the signed info into the metadata for mapping TINYINT to short or byte. ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? Uses can read MySQL UNSIGNED TINYINT values after this PR like versions before 3.5.0 which has breaked since 3.5.1 ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45556 from yaooqinn/SPARK-47435. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 9 ++-- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 9 ++-- .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 ++- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 15 -- .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 9 ++-- .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 9 ++-- .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 26 ++ .../sql/execution/datasources/jdbc/JdbcUtils.scala | 5 +- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 10 ++-- .../v2/jdbc/JDBCTableCatalogSuite.scala| 60 -- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 24 + 11 files changed, 114 insertions(+), 68 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index b1d239337aa0..79e88f109534 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -57,10 +57,11 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits BIT(10), " + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci DECIMAL(40,20), flt FLOAT, " - + "dbl DOUBLE, tiny TINYINT)").executeUpdate() + + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate() + conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', " + "17, 7, 123456789, 123456789012345, 123456789012345.123456789012345, " - + "42.75, 1.0002, -128)").executeUpdate() + + "42.75, 1.0002, -128, 255)").executeUpdate() conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts TIMESTAMP, " + "yr YEAR)").executeUpdate() @@ -90,7 +91,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { val rows = df.collect() assert(rows.length == 1) val types = rows(0).toSeq.map(x => x.getClass.toString) -assert(types.length == 10) +assert(types.length == 11) assert(types(0).equals("class java.lang.Boolean")) assert(types(1).equals("class java.lang.Long")) assert(types(2).equals("class java.lang.Integer")) @@ -101,6 +102,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(types(7).equals("class java.lang.Double")) assert(types(8).equals("class java.lang.Double")) assert(types(9).equals("class java.lang.Byte")) +assert(types(10).equals("class java.lang.Short")) assert(rows(0).getBoolean(0) == false) assert(rows(0).getLong(1) == 0x225) assert(rows(0).getInt(2) == 17) @@ -112,6 +114,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(rows(0).getDouble(7) == 42.75) assert(rows(0).getDouble(8) == 1.0002) assert(rows(0).getByte(9) == 0x80.toByte) +assert(rows(0).getShort(10) == 0xff.toShort) } test("Date types") { diff --git a/connector/docker-integration-tests/src/test/scala/org/apa
(spark) branch master updated (4dc362dbc6c0 -> 1aafe60b3e76)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0 add 1aafe60b3e76 [SPARK-47442][CORE][TEST] Use port 0 to start worker servers in MasterSuite No new revisions were added by this update. Summary of changes: .../test/scala/org/apache/spark/deploy/master/MasterSuiteBase.scala| 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47438][BUILD] Upgrade jackson to 2.17.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0 4dc362dbc6c0 is described below commit 4dc362dbc6c039d955e4dceb87e53dfc76ef2a5c Author: panbingkun AuthorDate: Mon Mar 18 08:25:16 2024 -0700 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0 ### What changes were proposed in this pull request? The pr aims to upgrade jackson from `2.16.1` to `2.17.0`. ### Why are the changes needed? The full release notes: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45562 from panbingkun/SPARK-47438. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 14 +++--- pom.xml | 4 ++-- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index d4b7d38aea22..86da61d89149 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -103,15 +103,15 @@ icu4j/72.1//icu4j-72.1.jar ini4j/0.5.4//ini4j-0.5.4.jar istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar ivy/2.5.2//ivy-2.5.2.jar -jackson-annotations/2.16.1//jackson-annotations-2.16.1.jar +jackson-annotations/2.17.0//jackson-annotations-2.17.0.jar jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar -jackson-core/2.16.1//jackson-core-2.16.1.jar -jackson-databind/2.16.1//jackson-databind-2.16.1.jar -jackson-dataformat-cbor/2.16.1//jackson-dataformat-cbor-2.16.1.jar -jackson-dataformat-yaml/2.16.1//jackson-dataformat-yaml-2.16.1.jar -jackson-datatype-jsr310/2.16.1//jackson-datatype-jsr310-2.16.1.jar +jackson-core/2.17.0//jackson-core-2.17.0.jar +jackson-databind/2.17.0//jackson-databind-2.17.0.jar +jackson-dataformat-cbor/2.17.0//jackson-dataformat-cbor-2.17.0.jar +jackson-dataformat-yaml/2.17.0//jackson-dataformat-yaml-2.17.0.jar +jackson-datatype-jsr310/2.17.0//jackson-datatype-jsr310-2.17.0.jar jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar -jackson-module-scala_2.13/2.16.1//jackson-module-scala_2.13-2.16.1.jar +jackson-module-scala_2.13/2.17.0//jackson-module-scala_2.13-2.17.0.jar jakarta.annotation-api/2.0.0//jakarta.annotation-api-2.0.0.jar jakarta.inject-api/2.0.1//jakarta.inject-api-2.0.1.jar jakarta.servlet-api/5.0.0//jakarta.servlet-api-5.0.0.jar diff --git a/pom.xml b/pom.xml index 757d911c1229..5cc56a92999d 100644 --- a/pom.xml +++ b/pom.xml @@ -184,8 +184,8 @@ true true 1.9.13 -2.16.1 - 2.16.1 +2.17.0 + 2.17.0 2.3.1 3.0.2 1.1.10.5 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 57424b92c5b5 [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md 57424b92c5b5 is described below commit 57424b92c5b5e7c3de680a7d8a6b137911f45666 Author: Matt Braymer-Hayes AuthorDate: Mon Mar 18 07:53:11 2024 -0700 [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md ### What changes were proposed in this pull request? Adds the Web UI to the `Other Documents` list on the main page. ### Why are the changes needed? I found it difficult to find the Web UI docs: it's only linked inside the Monitoring docs. Adding it to the main page will make it easier for people to find and use the docs. ### Does this PR introduce _any_ user-facing change? Yes: adds another cross-reference on the main page. ### How was this patch tested? Visually verified that Markdown still rendered properly. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45534 from mattayes/patch-2. Authored-by: Matt Braymer-Hayes Signed-off-by: Dongjoon Hyun --- docs/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 5f3858bec86b..12c53c40c8f7 100644 --- a/docs/index.md +++ b/docs/index.md @@ -138,6 +138,7 @@ options for deployment: * [Configuration](configuration.html): customize Spark via its configuration system * [Monitoring](monitoring.html): track the behavior of your applications +* [Web UI](web-ui.html): view useful information about your applications * [Tuning Guide](tuning.html): best practices to optimize performance and memory use * [Job Scheduling](job-scheduling.html): scheduling resources across and within Spark applications * [Security](security.html): Spark security support @@ -145,7 +146,7 @@ options for deployment: * Integration with other storage systems: * [Cloud Infrastructures](cloud-integration.html) * [OpenStack Swift](storage-openstack-swift.html) -* [Migration Guide](migration-guide.html): Migration guides for Spark components +* [Migration Guide](migration-guide.html): migration guides for Spark components * [Building Spark](building-spark.html): build Spark using the Maven system * [Contributing to Spark](https://spark.apache.org/contributing.html) * [Third Party Projects](https://spark.apache.org/third-party-projects.html): related third party Spark projects - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 7a899e219f5a [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` 7a899e219f5a is described below commit 7a899e219f5a17ab12aeb8d67738025b7e2b9d9c Author: Huw Campbell AuthorDate: Mon Mar 18 07:38:10 2024 -0700 [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` ### What changes were proposed in this pull request? Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when one is using proxy settings. Change the generated link to be consistent with other links and include a trailing slash ### Why are the changes needed? When using a proxy, an invalid redirect is issued if this is not included ### Does this PR introduce _any_ user-facing change? Only that people will be able to use these links if they are using a proxy ### How was this patch tested? With a proxy installed I went to the location this link would generate and could go to the page, when it redirects with the link as it exists. Edit: Further tested by building a version of our application with this patch applied, the links work now. ### Was this patch authored or co-authored using generative AI tooling? No. Page with working link https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3";> Goes correctly to https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5";> Before it would redirect and we'd get a 404. https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef";> Closes #45527 from HuwCampbell/patch-1. Authored-by: Huw Campbell Signed-off-by: Dongjoon Hyun (cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala index 7cd7db4088ac..ce3e7cde01b7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala @@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable( override def row(query: StructuredStreamingRow): Seq[Node] = { val streamingQuery = query.streamingUIData -val statisticsLink = "%s/%s/statistics?id=%s" +val statisticsLink = "%s/%s/statistics/?id=%s" .format(SparkUIUtils.prependBaseUri(request, parent.basePath), parent.prefix, streamingQuery.summary.runId) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new bb7a6138b827 [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` bb7a6138b827 is described below commit bb7a6138b827975fc827813ab42a2b9074bf8d5e Author: Huw Campbell AuthorDate: Mon Mar 18 07:38:10 2024 -0700 [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` ### What changes were proposed in this pull request? Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when one is using proxy settings. Change the generated link to be consistent with other links and include a trailing slash ### Why are the changes needed? When using a proxy, an invalid redirect is issued if this is not included ### Does this PR introduce _any_ user-facing change? Only that people will be able to use these links if they are using a proxy ### How was this patch tested? With a proxy installed I went to the location this link would generate and could go to the page, when it redirects with the link as it exists. Edit: Further tested by building a version of our application with this patch applied, the links work now. ### Was this patch authored or co-authored using generative AI tooling? No. Page with working link https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3";> Goes correctly to https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5";> Before it would redirect and we'd get a 404. https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef";> Closes #45527 from HuwCampbell/patch-1. Authored-by: Huw Campbell Signed-off-by: Dongjoon Hyun (cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala index 7cd7db4088ac..ce3e7cde01b7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala @@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable( override def row(query: StructuredStreamingRow): Seq[Node] = { val streamingQuery = query.streamingUIData -val statisticsLink = "%s/%s/statistics?id=%s" +val statisticsLink = "%s/%s/statistics/?id=%s" .format(SparkUIUtils.prependBaseUri(request, parent.basePath), parent.prefix, streamingQuery.summary.runId) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (d3f12df6e09e -> 9b466d329c3c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d3f12df6e09e [SPARK-47437][PYTHON][CONNECT] Correct the error class for `DataFrame.sort*` add 9b466d329c3c [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated (be0e44e59b3e -> b4e2c6750cb3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from be0e44e59b3e [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI add b4e2c6750cb3 [SPARK-47433][PYTHON][DOCS][INFRA][3.4] Update PySpark package dependency with version ranges No new revisions were added by this update. Summary of changes: dev/requirements.txt | 2 +- python/docs/source/getting_started/install.rst | 16 2 files changed, 9 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound requirement, `<13.0.0`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new cc6912ec612c [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound requirement, `<13.0.0` cc6912ec612c is described below commit cc6912ec612c30e46e1595860a5519bb1caa221b Author: Dongjoon Hyun AuthorDate: Sun Mar 17 15:15:50 2024 -0700 [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound requirement, `<13.0.0` ### What changes were proposed in this pull request? This PR aims to add `pyarrow` upper bound requirement, `<13.0.0`, to Apache Spark 3.5.x. ### Why are the changes needed? PyArrow 13.0.0 has breaking changes mentioned by #42920 which is a part of Apache Spark 4.0.0. ### Does this PR introduce _any_ user-facing change? No, this only clarifies the upper bound. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45553 from dongjoon-hyun/SPARK-47432. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/requirements.txt | 2 +- python/docs/source/getting_started/install.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/requirements.txt b/dev/requirements.txt index 597417aba1f3..0749af75aa4b 100644 --- a/dev/requirements.txt +++ b/dev/requirements.txt @@ -3,7 +3,7 @@ py4j # PySpark dependencies (optional) numpy -pyarrow +pyarrow<13.0.0 pandas scipy plotly diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index 6822285e9617..e97632a8b384 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -157,7 +157,7 @@ PackageSupported version Note == = == `py4j` >=0.10.9.7Required `pandas` >=1.0.5 Required for pandas API on Spark and Spark Connect; Optional for Spark SQL -`pyarrow` >=4.0.0 Required for pandas API on Spark and Spark Connect; Optional for Spark SQL +`pyarrow` >=4.0.0,<13.0.0 Required for pandas API on Spark and Spark Connect; Optional for Spark SQL `numpy`>=1.15Required for pandas API on Spark and MLLib DataFrame-based API; Optional for Spark SQL `grpcio` >=1.48,<1.57 Required for Spark Connect `grpcio-status`>=1.48,<1.57 Required for Spark Connect - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new be0e44e59b3e [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI be0e44e59b3e is described below commit be0e44e59b3e71cb11353e11f19146e0d1827432 Author: Ruifeng Zheng AuthorDate: Wed Sep 13 15:51:27 2023 +0800 [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI Pin `pyarrow==12.0.1` in CI to fix test failure, https://github.com/apache/spark/actions/runs/6167186123/job/16738683632 ``` == FAIL [0.095s]: test_from_to_pandas (pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, in _assert_pandas_equal assert_series_equal( File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 931, in assert_series_equal assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}") File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 415, in assert_attr_equal raise_assert_detail(obj, msg, left_attr, right_attr) File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 599, in raise_assert_detail raise AssertionError(msg) AssertionError: Attributes of Series are different Attribute "dtype" are different [left]: datetime64[ns] [right]: datetime64[us] ``` No CI and manually test No Closes #42897 from zhengruifeng/pin_pyarrow. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng (cherry picked from commit e3d2dfa8b514f9358823c3cb1ad6523da8a6646b) Signed-off-by: Dongjoon Hyun (cherry picked from commit 8049a203b8c5f2f8045701916e66cfc786e16b57) Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 ++-- dev/infra/Dockerfile | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 33747fb5b61d..2184577d5c44 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -252,7 +252,7 @@ jobs: - name: Install Python packages (Python 3.8) if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) run: | -python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5' +python3.8 -m pip install 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas scipy unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5' python3.8 -m pip list # Run the tests. - name: Run tests @@ -626,7 +626,7 @@ jobs: # See also https://issues.apache.org/jira/browse/SPARK-38279. python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 'alabaster==0.7.13' python3.9 -m pip install ipython_genutils # See SPARK-38517 -python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly>=4.8' +python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas 'plotly>=4.8' python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421 apt-get update -y apt-get install -y ruby ruby-dev diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 2e78f4af2144..93d8793826ff 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib -RUN python3.9 -m pip install numpy pyarrow 'pandas<=1.5.3' scipy unittest-xml-reporting plotl
(spark) branch branch-3.5 updated: [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 8049a203b8c5 [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI 8049a203b8c5 is described below commit 8049a203b8c5f2f8045701916e66cfc786e16b57 Author: Ruifeng Zheng AuthorDate: Wed Sep 13 15:51:27 2023 +0800 [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI ### What changes were proposed in this pull request? Pin `pyarrow==12.0.1` in CI ### Why are the changes needed? to fix test failure, https://github.com/apache/spark/actions/runs/6167186123/job/16738683632 ``` == FAIL [0.095s]: test_from_to_pandas (pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, in _assert_pandas_equal assert_series_equal( File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 931, in assert_series_equal assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}") File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 415, in assert_attr_equal raise_assert_detail(obj, msg, left_attr, right_attr) File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 599, in raise_assert_detail raise AssertionError(msg) AssertionError: Attributes of Series are different Attribute "dtype" are different [left]: datetime64[ns] [right]: datetime64[us] ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI and manually test ### Was this patch authored or co-authored using generative AI tooling? No Closes #42897 from zhengruifeng/pin_pyarrow. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng (cherry picked from commit e3d2dfa8b514f9358823c3cb1ad6523da8a6646b) Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 ++-- dev/infra/Dockerfile | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index b0760a955342..8488540b415d 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -258,7 +258,7 @@ jobs: - name: Install Python packages (Python 3.8) if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) run: | -python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.20.3' +python3.8 -m pip install 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas scipy unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.20.3' python3.8 -m pip list # Run the tests. - name: Run tests @@ -684,7 +684,7 @@ jobs: # See also https://issues.apache.org/jira/browse/SPARK-38279. python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 'alabaster==0.7.13' python3.9 -m pip install ipython_genutils # See SPARK-38517 -python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly>=4.8' +python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas 'plotly>=4.8' python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421 apt-get update -y apt-get install -y ruby ruby-dev diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index d3bae836cc63..d3fcd7ab3622 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" RUN pypy3 -m pip install numpy 'pandas
(spark) branch master updated: [SPARK-47426][BUILD] Upgrade Guava used by the connect module to `33.1.0-jre`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2dba72100e03 [SPARK-47426][BUILD] Upgrade Guava used by the connect module to `33.1.0-jre` 2dba72100e03 is described below commit 2dba72100e0326f1889ff0be2dc576b1e712ad15 Author: panbingkun AuthorDate: Sun Mar 17 13:52:14 2024 -0700 [SPARK-47426][BUILD] Upgrade Guava used by the connect module to `33.1.0-jre` ### What changes were proposed in this pull request? The pr aims to upgrade Guava used by the `connect` module to `33.1.0-jre`. ### Why are the changes needed? - The new version bring some bug fixes and optimizations as follows: cache: Fixed a bug that could cause https://github.com/google/guava/pull/6851#issuecomment-1931276822. hash: Optimized Checksum-based hash functions for Java 9+. - The full release notes: https://github.com/google/guava/releases/tag/v33.1.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45540 from panbingkun/SPARK-47426. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index d67ab1c01273..757d911c1229 100644 --- a/pom.xml +++ b/pom.xml @@ -288,7 +288,7 @@ true -33.0.0-jre +33.1.0-jre 1.0.2 1.62.2 1.1.3 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: Update the organization in committers.md (#509)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 3eae7010b9 Update the organization in committers.md (#509) 3eae7010b9 is described below commit 3eae7010b9f3cc01ceabe5036c0bd8910ccb8c67 Author: Jerry Shao AuthorDate: Sat Mar 16 20:53:28 2024 -0700 Update the organization in committers.md (#509) --- committers.md| 2 +- site/committers.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/committers.md b/committers.md index 58aedb94fd..17530a2411 100644 --- a/committers.md +++ b/committers.md @@ -73,7 +73,7 @@ navigation: |Josh Rosen|Stripe| |Sandy Ryza|Remix| |Kousuke Saruta|NTT Data| -|Saisai Shao|Tencent| +|Saisai Shao|Datastrato| |Prashant Sharma|IBM| |Gabor Somogyi|Apple| |Ram Sriharsha|Databricks| diff --git a/site/committers.html b/site/committers.html index 8a9839aa91..22e2f4c481 100644 --- a/site/committers.html +++ b/site/committers.html @@ -403,7 +403,7 @@ Saisai Shao - Tencent + Datastrato Prashant Sharma - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 3c41b1d97e1f [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208 3c41b1d97e1f is described below commit 3c41b1d97e1f5ff9f74f9ea72f7ea92dcbca2122 Author: Dongjoon Hyun AuthorDate: Fri Mar 15 22:42:17 2024 -0700 [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208 ### What changes were proposed in this pull request? This PR aims to upgrade Jetty to 9.4.54.v20240208 for Apache Spark 3.4.3. ### Why are the changes needed? To bring the latest bug fixes. - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.52.v20230823 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.51.v20230217 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45544 from dongjoon-hyun/SPARK-47428-3.4. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index 691c83632b38..a94fbcd0ca77 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -143,7 +143,7 @@ jersey-hk2/2.36//jersey-hk2-2.36.jar jersey-server/2.36//jersey-server-2.36.jar jetty-sslengine/6.1.26//jetty-sslengine-6.1.26.jar jetty-util/6.1.26//jetty-util-6.1.26.jar -jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar +jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar jetty/6.1.26//jetty-6.1.26.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.2//joda-time-2.12.2.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 4d94cb5c699e..99665da7d16a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -128,8 +128,8 @@ jersey-container-servlet/2.36//jersey-container-servlet-2.36.jar jersey-hk2/2.36//jersey-hk2-2.36.jar jersey-server/2.36//jersey-server-2.36.jar jettison/1.1//jettison-1.1.jar -jetty-util-ajax/9.4.50.v20221201//jetty-util-ajax-9.4.50.v20221201.jar -jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar +jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar +jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.2//joda-time-2.12.2.jar jodd-core/3.5.2//jodd-core-3.5.2.jar diff --git a/pom.xml b/pom.xml index 373d17b76c09..77218d162c41 100644 --- a/pom.xml +++ b/pom.xml @@ -143,7 +143,7 @@ 1.12.3 1.8.6 shaded-protobuf -9.4.50.v20221201 +9.4.54.v20240208 4.0.3 0.10.0 2.5.1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 210e80e8b7ba [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job 210e80e8b7ba is described below commit 210e80e8b7baa5fc1e6462615bc8134a4c90647c Author: Dongjoon Hyun AuthorDate: Tue Oct 17 23:38:56 2023 -0700 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job ### What changes were proposed in this pull request? This PR aims to skip `Unidoc` and `MIMA` phases in many general test pipelines. `mima` test is moved to `lint` job. ### Why are the changes needed? By having an independent document generation and mima checking GitHub Action job, we can skip them in the following many jobs. https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check the GitHub action logs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43422 from dongjoon-hyun/SPARK-45587. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794) Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 1 file changed, 4 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 13527119e51a..33747fb5b61d 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -198,6 +198,8 @@ jobs: HIVE_PROFILE: ${{ matrix.hive }} GITHUB_PREV_SHA: ${{ github.event.before }} SPARK_LOCAL_IP: localhost + SKIP_UNIDOC: true + SKIP_MIMA: true SKIP_PACKAGING: true steps: - name: Checkout Spark repository @@ -578,6 +580,8 @@ jobs: run: ./dev/check-license - name: Dependencies test run: ./dev/test-dependencies.sh +- name: MIMA test + run: ./dev/mima - name: Scala linter run: ./dev/lint-scala - name: Java linter - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 8c6eeb8ab018 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job 8c6eeb8ab018 is described below commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794 Author: Dongjoon Hyun AuthorDate: Tue Oct 17 23:38:56 2023 -0700 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job ### What changes were proposed in this pull request? This PR aims to skip `Unidoc` and `MIMA` phases in many general test pipelines. `mima` test is moved to `lint` job. ### Why are the changes needed? By having an independent document generation and mima checking GitHub Action job, we can skip them in the following many jobs. https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check the GitHub action logs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43422 from dongjoon-hyun/SPARK-45587. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 1 file changed, 4 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index ad8685754b31..b0760a955342 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -204,6 +204,8 @@ jobs: HIVE_PROFILE: ${{ matrix.hive }} GITHUB_PREV_SHA: ${{ github.event.before }} SPARK_LOCAL_IP: localhost + SKIP_UNIDOC: true + SKIP_MIMA: true SKIP_PACKAGING: true steps: - name: Checkout Spark repository @@ -627,6 +629,8 @@ jobs: run: ./dev/check-license - name: Dependencies test run: ./dev/test-dependencies.sh +- name: MIMA test + run: ./dev/mima - name: Scala linter run: ./dev/lint-scala - name: Java linter - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new d59425275cdd [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208 d59425275cdd is described below commit d59425275cdd0ff678a5bcccef4c7b74fe8170cb Author: Dongjoon Hyun AuthorDate: Fri Mar 15 22:28:45 2024 -0700 [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208 ### What changes were proposed in this pull request? This PR aims to upgrade Jetty to 9.4.54.v20240208 ### Why are the changes needed? To bring the latest bug fixes. - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45543 from dongjoon-hyun/SPARK-47428. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index c76702cd0af0..8ecf931bf513 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -130,8 +130,8 @@ jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar jersey-hk2/2.40//jersey-hk2-2.40.jar jersey-server/2.40//jersey-server-2.40.jar jettison/1.1//jettison-1.1.jar -jetty-util-ajax/9.4.52.v20230823//jetty-util-ajax-9.4.52.v20230823.jar -jetty-util/9.4.52.v20230823//jetty-util-9.4.52.v20230823.jar +jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar +jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.5//joda-time-2.12.5.jar jodd-core/3.5.2//jodd-core-3.5.2.jar diff --git a/pom.xml b/pom.xml index 5db3c78e00eb..fb6208777d3f 100644 --- a/pom.xml +++ b/pom.xml @@ -143,7 +143,7 @@ 1.13.1 1.9.2 shaded-protobuf -9.4.52.v20230823 +9.4.54.v20240208 4.0.3 0.10.0
(spark) branch master updated (4437e6e21237 -> 6bf031796c8c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` to `common/utils` add 6bf031796c8c [SPARK-44740][CONNECT][TESTS][FOLLOWUP] Deduplicate `test_metadata` No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/connect/test_connect_session.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (b7aa9740249b -> 4437e6e21237)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b7aa9740249b [SPARK-47407][SQL] Support java.sql.Types.NULL map to NullType add 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` to `common/utils` No new revisions were added by this update. Summary of changes: .../utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename {core => common/utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties (100%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47234][BUILD] Upgrade Scala to 2.13.13
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 56cfc89e8f15 [SPARK-47234][BUILD] Upgrade Scala to 2.13.13 56cfc89e8f15 is described below commit 56cfc89e8f1599fe859db1bd6628a9b07d53bed4 Author: panbingkun AuthorDate: Thu Mar 14 22:40:54 2024 -0700 [SPARK-47234][BUILD] Upgrade Scala to 2.13.13 ### What changes were proposed in this pull request? The pr aims to upgrade scala from `2.13.12` to `2.13.13`. ### Why are the changes needed? - The new version bring some bug fixes: https://github.com/scala/scala/pull/10525 https://github.com/scala/scala/pull/10528 - The release notes as follows: https://github.com/scala/scala/releases/tag/v2.13.13 ### Does this PR introduce _any_ user-facing change? Yes, The `scala` version is changed from `2.13.12` to `2.13.13`. ### How was this patch tested? - Pass GA. - After the master is upgraded to this version `2.13.13`, we need to continue to observe. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45342 from panbingkun/SPARK-47234. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 docs/_config.yml | 2 +- pom.xml | 4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 2e091cb3638e..d4b7d38aea22 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -139,7 +139,7 @@ jettison/1.5.4//jettison-1.5.4.jar jetty-util-ajax/11.0.20//jetty-util-ajax-11.0.20.jar jetty-util/11.0.20//jetty-util-11.0.20.jar jline/2.14.6//jline-2.14.6.jar -jline/3.22.0//jline-3.22.0.jar +jline/3.24.1//jline-3.24.1.jar jna/5.13.0//jna-5.13.0.jar joda-time/2.12.7//joda-time-2.12.7.jar jodd-core/3.5.2//jodd-core-3.5.2.jar @@ -245,11 +245,11 @@ py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar rocksdbjni/8.11.3//rocksdbjni-8.11.3.jar scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar -scala-compiler/2.13.12//scala-compiler-2.13.12.jar -scala-library/2.13.12//scala-library-2.13.12.jar +scala-compiler/2.13.13//scala-compiler-2.13.13.jar +scala-library/2.13.13//scala-library-2.13.13.jar scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar -scala-reflect/2.13.12//scala-reflect-2.13.12.jar +scala-reflect/2.13.13//scala-reflect-2.13.13.jar scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar slf4j-api/2.0.12//slf4j-api-2.0.12.jar snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar diff --git a/docs/_config.yml b/docs/_config.yml index 7a305ceea67b..19183f85df23 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -22,7 +22,7 @@ include: SPARK_VERSION: 4.0.0-SNAPSHOT SPARK_VERSION_SHORT: 4.0.0 SCALA_BINARY_VERSION: "2.13" -SCALA_VERSION: "2.13.12" +SCALA_VERSION: "2.13.13" SPARK_ISSUE_TRACKER_URL: https://issues.apache.org/jira/browse/SPARK SPARK_GITHUB_URL: https://github.com/apache/spark # Before a new release, we should: diff --git a/pom.xml b/pom.xml index 6a811e74e7f8..d67ab1c01273 100644 --- a/pom.xml +++ b/pom.xml @@ -172,7 +172,7 @@ 3.2.2 4.4 -2.13.12 +2.13.13 2.13 2.2.0 @@ -226,7 +226,7 @@ ./python/pyspark/sql/pandas/utils.py, and ./python/setup.py too. --> 15.0.0 -2.5.11 +3.0.0-M1 org.fusesource.leveldbjni - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (213399b61de5 -> fe0aa1edff04)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 213399b61de5 [SPARK-47396][SQL] Add a general mapping for TIME WITHOUT TIME ZONE to TimestampNTZType add fe0aa1edff04 [SPARK-47402][BUILD] Upgrade `ZooKeeper` to 3.9.2 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (7b4ab4fa452d -> 213399b61de5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7b4ab4fa452d [SPARK-47387][SQL] Remove some unused error classes add 213399b61de5 [SPARK-47396][SQL] Add a general mapping for TIME WITHOUT TIME ZONE to TimestampNTZType No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala | 1 + .../src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 10 ++ 2 files changed, 11 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in PostgresDialect
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d41d5ecda8c1 [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in PostgresDialect d41d5ecda8c1 is described below commit d41d5ecda8c11d7e8f6a1fafa1d2be97c0f49f04 Author: Kent Yao AuthorDate: Thu Mar 14 10:30:48 2024 -0700 [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in PostgresDialect ### What changes were proposed in this pull request? This PR fixes a bug in SPARK-47390, we shall separate TIME from TIMESTAMP case-match branch ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? local test with #45519 merged together ### Was this patch authored or co-authored using generative AI tooling? no Closes #45522 from yaooqinn/SPARK-47390-F. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala index 7d8ed70b2bd1..9b286620a140 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala @@ -58,10 +58,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper { // See SPARK-34333 and https://github.com/pgjdbc/pgjdbc/issues/100 Some(StringType) case Types.TIMESTAMP -if "timestamptz".equalsIgnoreCase(typeName) || "timetz".equalsIgnoreCase(typeName) => +if "timestamptz".equalsIgnoreCase(typeName) => // timestamptz represents timestamp with time zone, currently it maps to Types.TIMESTAMP. // We need to change to Types.TIMESTAMP_WITH_TIMEZONE if the upstream changes. Some(TimestampType) + case Types.TIME if "timetz".equalsIgnoreCase(typeName) => +// timetz represents time with time zone, currently it maps to Types.TIME. +// We need to change to Types.TIME_WITH_TIMEZONE if the upstream changes. +Some(TimestampType) case Types.OTHER => Some(StringType) case _ if "text".equalsIgnoreCase(typeName) => Some(StringType) // sqlType is Types.VARCHAR case Types.ARRAY => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (481597cd2d79 -> b98accd9d931)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 481597cd2d79 [SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20 add b98accd9d931 [SPARK-47401][K8S][DOCS] Update `YuniKorn` docs with v1.5 No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 481597cd2d79 [SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20 481597cd2d79 is described below commit 481597cd2d790e168cde113bf13b34fdb471f377 Author: Dongjoon Hyun AuthorDate: Thu Mar 14 09:41:03 2024 -0700 [SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20 ### What changes were proposed in this pull request? This PR aims to upgrade `gas-connector` to 2.2.20. ### Why are the changes needed? To bring the latest updates. - https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.20 - Add support for renaming folders using rename backend API for Hierarchical namespace buckets - Upgrade java-storage to 2.32.1 and upgrade the version of related dependencies - https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.10 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ``` $ dev/make-distribution.sh -Phadoop-cloud $ cd dist $ export KEYFILE=~/.ssh/apache-spark.json $ export EMAIL=$(jq -r '.client_email' < $KEYFILE) $ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE) $ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)" $ bin/spark-shell \ -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \ -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \ -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY" Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.13.12 (OpenJDK 64-Bit Server VM, Java 21.0.2) Type in expressions to have them evaluated. Type :help for more information. 24/03/14 09:33:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1710434021996). Spark session available as 'spark'. scala> spark.read.text("gs://apache-spark-bucket/README.md").count() val res0: Long = 124 scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc") scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show() +--+--++ | name|favorite_color|favorite_numbers| +--+--++ |Alyssa| NULL| [3, 9, 15, 20]| | Ben| red| []| +--+--+----+ scala> ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45521 from dongjoon-hyun/SPARK-47400. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 52d91d938ffb..1f915789e3ea 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -66,7 +66,7 @@ dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-met eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar -gcs-connector/hadoop3-2.2.18/shaded/gcs-connector-hadoop3-2.2.18-shaded.jar +gcs-connector/hadoop3-2.2.20/shaded/gcs-connector-hadoop3-2.2.20-shaded.jar gmetric4j/1.0.10//gmetric4j-1.0.10.jar gson/2.2.4//gson-2.2.4.jar guava/14.0.1//guava-14.0.1.jar diff --git a/pom.xml b/pom.xml index 3f82f6321d5a..ecb0c3891e4e 100644 --- a/pom.xml +++ b/pom.xml @@ -163,7 +163,7 @@ 2.20.160 0.12.8 -hadoop3-2.2.18 +hadoop3-2.2.20 4.5.14 4.4.16 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47384][BUILD] Upgrade RoaringBitmap to 1.0.5
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 205e826e7052 [SPARK-47384][BUILD] Upgrade RoaringBitmap to 1.0.5 205e826e7052 is described below commit 205e826e7052f59f90673d8f1388e727136b5ff7 Author: panbingkun AuthorDate: Thu Mar 14 08:48:44 2024 -0700 [SPARK-47384][BUILD] Upgrade RoaringBitmap to 1.0.5 ### What changes were proposed in this pull request? The pr aims to upgrade `RoaringBitmap` from `1.0.1` to `1.0.5`. ### Why are the changes needed? Release notes: https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.0.5 This version includes some bugs fixed, eg: - fix roaringbitmap - batchiterator's advanceIfNeeded to handle run lengths of zero by - fix RangeBitmap#between bug in full section after empty section ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45507 from panbingkun/SPARK-47384. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 8 core/benchmarks/MapStatusesConvertBenchmark-results.txt | 8 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt index 4ca3c17fa45e..76c3a2ad6fb9 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1053-azure +OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure AMD EPYC 7763 64-Core Processor MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500689694 5 0.0 688848614.0 1.0X -Num Maps: 5 Fetch partitions:1000 1511 1517 7 0.0 1511337028.0 0.5X -Num Maps: 5 Fetch partitions:1500 2279 2298 20 0.0 2278703144.0 0.3X +Num Maps: 5 Fetch partitions:500696699 3 0.0 695980122.0 1.0X +Num Maps: 5 Fetch partitions:1000 1593 1615 19 0.0 1592993119.0 0.4X +Num Maps: 5 Fetch partitions:1500 2455 2476 22 0.0 2454771901.0 0.3X diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-results.txt index a5cd0cf9b05b..eafd72dbe8b8 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Linux 5.15.0-1053-azure +OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure AMD EPYC 7763 64-Core Processor MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500646655 13 0.0 645845205.0 1.0X -Num Maps: 5 Fetch partitions:1000 1175 1195 18 0.0 1174727440.0 0.5X -Num Maps: 5 Fetch partitions:1500 1767 1830 55 0.0 1767363076.0 0.4X +Num Maps: 5 Fetch partitions:500714716 2 0.0 713899011.0 1.0X +Num Maps: 5 Fetch partitions:1000 1602 1647 59 0.0 1602358288.0 0.4X +Num Maps: 5 Fetch partitions:1500 2517 2538 22 0.0 2517027078.0 0.3X diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 6b357b3
(spark) branch master updated (5ce150735bc5 -> 168346f93303)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 5ce150735bc5 [SPARK-47394][SQL] Support TIMESTAMP WITH TIME ZONE for H2Dialect add 168346f93303 [SPARK-47391][SQL] Remove the test case workaround for JDK 8 No new revisions were added by this update. Summary of changes: .../catalyst/encoders/ExpressionEncoderSuite.scala | 71 -- 1 file changed, 71 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47394][SQL] Support TIMESTAMP WITH TIME ZONE for H2Dialect
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5ce150735bc5 [SPARK-47394][SQL] Support TIMESTAMP WITH TIME ZONE for H2Dialect 5ce150735bc5 is described below commit 5ce150735bc57f482f18fa5a04d16caae0e24041 Author: Kent Yao AuthorDate: Thu Mar 14 07:49:41 2024 -0700 [SPARK-47394][SQL] Support TIMESTAMP WITH TIME ZONE for H2Dialect ### What changes were proposed in this pull request? Following the guidelines of SPARK-47375, this PR supports TIMESTAMP WITH TIME ZONE for H2Dialect and maps it to TimestampType regardless of the option `preferTimestampNTZ` https://www.h2database.com/html/datatypes.html#timestamp_with_time_zone_type ### Why are the changes needed? H2Dialect improvement, we currently don't have a default mapping for `java.sql.Types.TIME_WITH_TIMEZONE, TIMESTAMP_WITH_TIMEZONE` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45516 from yaooqinn/SPARK-47394. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/jdbc/H2Dialect.scala| 3 ++- .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala| 18 +++--- 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala index 74eca7e48577..f4a1650b3e8c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala @@ -35,7 +35,7 @@ import org.apache.spark.sql.connector.catalog.functions.UnboundFunction import org.apache.spark.sql.connector.catalog.index.TableIndex import org.apache.spark.sql.connector.expressions.{Expression, FieldReference, NamedReference} import org.apache.spark.sql.execution.datasources.jdbc.{JDBCOptions, JdbcUtils} -import org.apache.spark.sql.types.{BooleanType, ByteType, DataType, DecimalType, MetadataBuilder, ShortType, StringType} +import org.apache.spark.sql.types.{BooleanType, ByteType, DataType, DecimalType, MetadataBuilder, ShortType, StringType, TimestampType} private[sql] object H2Dialect extends JdbcDialect { override def canHandle(url: String): Boolean = @@ -68,6 +68,7 @@ private[sql] object H2Dialect extends JdbcDialect { val scale = if (null != md) md.build().getLong("scale") else 0L val selectedScale = (DecimalType.MAX_PRECISION * (scale.toDouble / size.toDouble)).toInt Option(DecimalType(DecimalType.MAX_PRECISION, selectedScale)) + case Types.TIMESTAMP_WITH_TIMEZONE | Types.TIME_WITH_TIMEZONE => Some(TimestampType) case _ => None } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala index b8ca70e0b175..8f286eaa2c54 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala @@ -1467,13 +1467,6 @@ class JDBCSuite extends QueryTest with SharedSparkSession { } test("unsupported types") { -checkError( - exception = intercept[SparkSQLException] { -spark.read.jdbc(urlWithUserAndPass, "TEST.TIMEZONE", new Properties()).collect() - }, - errorClass = "UNRECOGNIZED_SQL_TYPE", - parameters = -Map("typeName" -> "TIMESTAMP WITH TIME ZONE", "jdbcType" -> "TIMESTAMP_WITH_TIMEZONE")) checkError( exception = intercept[SparkSQLException] { spark.read.jdbc(urlWithUserAndPass, "TEST.ARRAY_TABLE", new Properties()).collect() @@ -1482,6 +1475,17 @@ class JDBCSuite extends QueryTest with SharedSparkSession { parameters = Map("typeName" -> "INTEGER ARRAY", "jdbcType" -> "ARRAY")) } + + test("SPARK-47394: Convert TIMESTAMP WITH TIME ZONE to TimestampType") { +Seq(true, false).foreach { prefer => + val df = spark.read +.option("preferTimestampNTZ", prefer) +.jdbc(urlWithUserAndPass, "TEST.TIMEZONE", new Properties()) + val expected = sql("select timestamp'1999-01-08 04:05:06.543544-08:00'") + checkAnswer(df, expected) +} + } + test("SPARK-19318: Connection properties keys should be case-sensitive.") { def testJdbcOptions(options: JDBCOptions): Unit = { // Spark JDBC data source options are case-insensitive - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf UT run well in IDE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b75325ccefa6 [SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf UT run well in IDE b75325ccefa6 is described below commit b75325ccefa67b0c2daee317264808c67d76854f Author: panbingkun AuthorDate: Wed Mar 13 09:56:13 2024 -0700 [SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf UT run well in IDE ### What changes were proposed in this pull request? The pr aims to make the related Protobuf `UT` run well in IDE (IntelliJ IDEA). ### Why are the changes needed? Facilitate developers to debug the related Protobuf `UT`. Before: https://github.com/apache/spark/assets/15246973/c00781b2-3477-4b2c-b871-ead997fda697";> After: https://github.com/apache/spark/assets/15246973/665fc67d-c69e-45c7-b37d-bb4ef8e72930";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45498 from panbingkun/SPARK-47378. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- .../test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala b/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala index e3add49f2b80..b53ba947216a 100644 --- a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala +++ b/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala @@ -28,6 +28,9 @@ import org.apache.spark.sql.types.{DataType, StructType} trait ProtobufTestBase extends SQLTestUtils { + private val descriptorDir = getWorkspaceFilePath( +"connector", "protobuf", "target", "generated-test-sources") + /** * Returns path for a Protobuf descriptor file used in the tests. These files are generated * during the build. Maven and SBT create the descriptor files differently. Maven creates one @@ -35,7 +38,7 @@ trait ProtobufTestBase extends SQLTestUtils { * all the Protobuf files. As a result actual file path returned in each case is different. */ protected def protobufDescriptorFile(fileName: String): String = { -val dir = "target/generated-test-sources" +val dir = descriptorDir.toFile.getCanonicalPath if (new File(s"$dir/$fileName").exists) { s"$dir/$fileName" } else { // sbt test - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47373][SQL] Match FileSourceScanLike to get metadata instead of FileSourceScanExec
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7d0f0832ae1a [SPARK-47373][SQL] Match FileSourceScanLike to get metadata instead of FileSourceScanExec 7d0f0832ae1a is described below commit 7d0f0832ae1a222bd9c2492587b37fc1939a51e5 Author: zwangsheng AuthorDate: Wed Mar 13 00:28:26 2024 -0700 [SPARK-47373][SQL] Match FileSourceScanLike to get metadata instead of FileSourceScanExec ### What changes were proposed in this pull request? When get Spark Plan info, we should match basic trait `FileSourceScanLike` to get metadata instead of matching subclass `FileSourceScanExec`. So that user-define file scan operators(which extend `FileSourceScanLike`) can be matched. ### Why are the changes needed? Match user-define file scan operators. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Exists Unit Test ### Was this patch authored or co-authored using generative AI tooling? No Closes #45491 from zwangsheng/SPARK-47373. Authored-by: zwangsheng Signed-off-by: Dongjoon Hyun --- .../src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala index 9b699801c97a..7c45b02ee846 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala @@ -66,7 +66,7 @@ private[execution] object SparkPlanInfo { // dump the file scan metadata (e.g file path) to event log val metadata = plan match { - case fileScan: FileSourceScanExec => fileScan.metadata + case fileScan: FileSourceScanLike => fileScan.metadata case _ => Map[String, String]() } new SparkPlanInfo( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47349][SQL][TESTS] Refactor string function `startsWith` and `endsWith` tests
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 806f8e0466b9 [SPARK-47349][SQL][TESTS] Refactor string function `startsWith` and `endsWith` tests 806f8e0466b9 is described below commit 806f8e0466b968d3fe87c7bbe3326bdf5458677a Author: Stevo Mitric AuthorDate: Tue Mar 12 16:54:55 2024 -0700 [SPARK-47349][SQL][TESTS] Refactor string function `startsWith` and `endsWith` tests ### What changes were proposed in this pull request? Refactored tests inside `CollationSuite` by migrating `startsWith` and `endsWith` tests into new `UTF8StringWithCollationSuite` suite that does unit string-level tests. Changes originally proposed in [this PR](https://github.com/apache/spark/pull/45421#discussion_r1519451854). ### Why are the changes needed? Removes cluttering of `CollationSuite`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test suite proposed in this PR ### Was this patch authored or co-authored using generative AI tooling? No Closes #45477 from stevomitric/stevomitric/string-function-tests. Authored-by: Stevo Mitric Signed-off-by: Dongjoon Hyun --- .../unsafe/types/UTF8StringWithCollationSuite.java | 103 + .../org/apache/spark/sql/CollationSuite.scala | 60 +--- 2 files changed, 105 insertions(+), 58 deletions(-) diff --git a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringWithCollationSuite.java b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringWithCollationSuite.java new file mode 100644 index ..b60da7b945a4 --- /dev/null +++ b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringWithCollationSuite.java @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.unsafe.types; + +import org.apache.spark.SparkException; +import org.apache.spark.sql.catalyst.util.CollationFactory; +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.*; + + +public class UTF8StringWithCollationSuite { + + private void assertStartsWith(String pattern, String prefix, String collationName, boolean value) + throws SparkException { + assertEquals(UTF8String.fromString(pattern).startsWith(UTF8String.fromString(prefix), +CollationFactory.collationNameToId(collationName)), value); + } + + private void assertEndsWith(String pattern, String suffix, String collationName, boolean value) + throws SparkException { + assertEquals(UTF8String.fromString(pattern).endsWith(UTF8String.fromString(suffix), +CollationFactory.collationNameToId(collationName)), value); + } + + @Test + public void startsWithTest() throws SparkException { +assertStartsWith("", "", "UTF8_BINARY", true); +assertStartsWith("c", "", "UTF8_BINARY", true); +assertStartsWith("", "c", "UTF8_BINARY", false); +assertStartsWith("abcde", "a", "UTF8_BINARY", true); +assertStartsWith("abcde", "A", "UTF8_BINARY", false); +assertStartsWith("abcde", "bcd", "UTF8_BINARY", false); +assertStartsWith("abcde", "BCD", "UTF8_BINARY", false); +assertStartsWith("", "", "UNICODE", true); +assertStartsWith("c", "", "UNICODE", true); +assertStartsWith("", "c", "UNICODE", false); +assertStartsWith("abcde", "a", "UNICODE", true); +assertStartsWith("abcde", "A", "UNICODE", false); +assertStartsWith("abcde", "bcd", "UNICODE", false); +assertStartsWith("abcde", "BCD", "UNICODE", false); +assertStartsWi
(spark) branch master updated: [SPARK-47364][CORE] Make `PluginEndpoint` warn when plugins reply for one-way message
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8fcef1657a02 [SPARK-47364][CORE] Make `PluginEndpoint` warn when plugins reply for one-way message 8fcef1657a02 is described below commit 8fcef1657a02189f91d5485eabb5b165706cdce9 Author: Dongjoon Hyun AuthorDate: Tue Mar 12 12:44:01 2024 -0700 [SPARK-47364][CORE] Make `PluginEndpoint` warn when plugins reply for one-way message ### What changes were proposed in this pull request? This PR aims to make `PluginEndpoint` warn when plugins reply for one-way message. ### Why are the changes needed? Previously, it logs `INFO` level messages. Sometimes, it look 66% driver INFO logs. We had better increase the log level to make the users fix the issues. ### Does this PR introduce _any_ user-facing change? No. Only a log level. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45478 from dongjoon-hyun/SPARK-47364. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala b/core/src/main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala index 989ef8f2edf2..bc45aefa560e 100644 --- a/core/src/main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala +++ b/core/src/main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala @@ -35,7 +35,7 @@ private class PluginEndpoint( try { val reply = plugin.receive(message) if (reply != null) { - logInfo( + logWarning( s"Plugin $pluginName returned reply for one-way message of type " + s"${message.getClass().getName()}.") } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46919][BUILD][CONNECT] Upgrade `grpcio*` and `grpc-java` to 1.62.x
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6e5d1db9058d [SPARK-46919][BUILD][CONNECT] Upgrade `grpcio*` and `grpc-java` to 1.62.x 6e5d1db9058d is described below commit 6e5d1db9058de62a45f35d3f41e028a72f688b70 Author: yangjie01 AuthorDate: Tue Mar 12 08:00:37 2024 -0700 [SPARK-46919][BUILD][CONNECT] Upgrade `grpcio*` and `grpc-java` to 1.62.x ### What changes were proposed in this pull request? This PR aims to upgrade `grpcio*` from 1.59.3 to [1.62.0](https://pypi.org/project/grpcio/1.62.0/)and `grpc-java` from 1.59.0 to 1.62.2 for Apache Spark 4.0.0. ### Why are the changes needed? grpc 1.60.0 start to support dualstack IPv4 and IPv6 backend support: - Implemented dualstack IPv4 and IPv6 backend support, as per draft gRFC A61. xDS support currently guarded by GRPC_EXPERIMENTAL_XDS_DUALSTACK_ENDPOINTS env var. Note that in `grpc-java` 1.61.0, since the dependency scope of `grpc-protobuf` on `grpc-protobuf-lite` has been changed from `compile` to `runtime`, we need to manually configure the dependency of the `connect` module on `grpc-protobuf-lite` and explicitly exclude the dependency on `protobuf-javalite` because `SparkConnectService` uses `io.grpc.protobuf.lite.ProtoLiteUtils` - https://github.com/grpc/grpc-java/pull/10756/files The relevant release notes are as follows: - https://github.com/grpc/grpc/releases/tag/v1.60.0 - https://github.com/grpc/grpc/releases/tag/v1.60.1 - https://github.com/grpc/grpc/releases/tag/v1.61.0 - https://github.com/grpc/grpc/releases/tag/v1.61.1 - https://github.com/grpc/grpc/releases/tag/v1.62.0 - https://github.com/grpc/grpc-java/releases/tag/v1.60.0 - https://github.com/grpc/grpc-java/releases/tag/v1.60.1 - https://github.com/grpc/grpc-java/releases/tag/v1.61.0 - https://github.com/grpc/grpc-java/releases/tag/v1.61.1 - https://github.com/grpc/grpc-java/releases/tag/v1.62.2 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44929 from LuciferYang/grpc-16. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 ++-- .github/workflows/maven_test.yml | 2 +- connector/connect/common/src/main/buf.gen.yaml | 4 ++-- connector/connect/server/pom.xml | 11 +++ dev/create-release/spark-rm/Dockerfile | 2 +- dev/infra/Dockerfile | 2 +- dev/requirements.txt | 4 ++-- pom.xml| 2 +- project/SparkBuild.scala | 2 +- python/docs/source/getting_started/install.rst | 4 ++-- python/setup.py| 2 +- 11 files changed, 25 insertions(+), 14 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 4f2be1c04f98..faa495fe5dfc 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -252,7 +252,7 @@ jobs: - name: Install Python packages (Python 3.9) if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) || contains(matrix.modules, 'connect') run: | -python3.9 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.59.3' 'grpcio-status==1.59.3' 'protobuf==4.25.1' +python3.9 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.62.0' 'grpcio-status==1.62.0' 'protobuf==4.25.1' python3.9 -m pip list # Run the tests. - name: Run tests @@ -702,7 +702,7 @@ jobs: python3.9 -m pip install 'sphinx==4.5.0' mkdocs 'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0' \ ipython ipython_genutils sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly>=4.8' 'docutils<0.18.0' \ 'flake8==3.9.0' 'mypy==1.8.0' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' 'black==23.9.1' \ - 'pandas-stubs==1.2.0.53' 'grpcio==1.59.3' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' \ + 'pandas-stubs==1.2.0.53' 'grpcio==1.62.0'
(spark) branch master updated: [SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e8bc176e6fd1 [SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE e8bc176e6fd1 is described below commit e8bc176e6fd145bab4cde6bf38931a7ad4c7eecd Author: Kent Yao AuthorDate: Tue Mar 12 07:33:24 2024 -0700 [SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE ### What changes were proposed in this pull request? This PR Supports TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE when `preferTimestampNTZ` option is set to true by users ### Why are the changes needed? improve DB2 connector ### Does this PR introduce _any_ user-facing change? yes, preferTimestampNTZ works for DB2 TIMESTAMP WITH TIME ZONE ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45471 from yaooqinn/SPARK-47342. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala| 14 ++ .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala | 10 -- .../main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala | 2 +- .../scala/org/apache/spark/sql/jdbc/JdbcDialects.scala | 7 +++ .../scala/org/apache/spark/sql/jdbc/PostgresDialect.scala | 13 + .../test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 9 +++-- 6 files changed, 42 insertions(+), 13 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala index cedb33d491fb..14776047cec4 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql.jdbc import java.math.BigDecimal import java.sql.{Connection, Date, Timestamp} +import java.time.LocalDateTime import java.util.Properties import org.scalatest.time.SpanSugar._ @@ -224,4 +225,17 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationSuite { assert(actual === expected) } + + test("SPARK-47342:gi Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE") { +// The test only covers TIMESTAMP WITHOUT TIME ZONE so far, we shall support +// TIMESTAMP WITH TIME ZONE but I don't figure it out to mock a TSTZ value. +withDefaultTimeZone(UTC) { + val df = spark.read.format("jdbc") +.option("url", jdbcUrl) +.option("preferTimestampNTZ", "true") +.option("query", "select ts from dates") +.load() + checkAnswer(df, Row(LocalDateTime.of(2009, 2, 13, 23, 31, 30))) +} + } } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index a7bbb832a839..27c032471b57 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -212,8 +212,7 @@ object JdbcUtils extends Logging with SQLConfHelper { case java.sql.Types.SQLXML => StringType case java.sql.Types.STRUCT => StringType case java.sql.Types.TIME => TimestampType -case java.sql.Types.TIMESTAMP if isTimestampNTZ => TimestampNTZType -case java.sql.Types.TIMESTAMP => TimestampType +case java.sql.Types.TIMESTAMP => getTimestampType(isTimestampNTZ) case java.sql.Types.TINYINT => IntegerType case java.sql.Types.VARBINARY => BinaryType case java.sql.Types.VARCHAR if conf.charVarcharAsString => StringType @@ -229,6 +228,13 @@ object JdbcUtils extends Logging with SQLConfHelper { throw QueryExecutionErrors.unrecognizedSqlTypeError(jdbcType, typeName) } + /** + * Return TimestampNTZType if isTimestampNT; otherwise TimestampType. + */ + def getTimestampType(isTimestampNTZ: Boolean): DataType = { +if (isTimestampNTZ) TimestampNTZType else TimestampType + } + /** * Returns the schema if the table already exists in the JDBC database. */ diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala index 62c31b1c4c5d..ff3e74eae205 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala +++ b/sql/core/src/main/scala/org/
(spark) branch master updated: [SPARK-47335][BUILD] Upgrade `mvn-scalafmt` to `1.1.1684076452.9f83818` & `scalafmt` to `3.8.0`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 72ecba538406 [SPARK-47335][BUILD] Upgrade `mvn-scalafmt` to `1.1.1684076452.9f83818` & `scalafmt` to `3.8.0` 72ecba538406 is described below commit 72ecba5384060720b114037bec70ff4328625889 Author: panbingkun AuthorDate: Tue Mar 12 07:32:18 2024 -0700 [SPARK-47335][BUILD] Upgrade `mvn-scalafmt` to `1.1.1684076452.9f83818` & `scalafmt` to `3.8.0` ### What changes were proposed in this pull request? The pr aims to upgrade `mvn-scalafmt` from `1.1.1640084764.9f463a9` to `1.1.1684076452.9f83818`. ### Why are the changes needed? - mvn-scalafmt The last `mvn-scalafmt` upgrade occurred 1 year ago, https://github.com/apache/spark/pull/37727 The latest version of `mvn-scalafmt` release notes: https://github.com/SimonJPegg/mvn_scalafmt/releases/tag/2.13-1.1.1684076452.9f83818 - scalafmt The latest version of `scalafmt` release notes: https://github.com/scalameta/scalafmt/releases/tag/v3.8.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. ``` ./build/mvn scalafmt:format -Dscalafmt.skip=false ... [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 2.216 s [INFO] Finished at: 2024-03-10T20:30:11+08:00 [INFO] ``` ``` ./dev/scalafmt ... [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:56 min [INFO] Finished at: 2024-03-10T20:19:46+08:00 [INFO] ``` - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45452 from panbingkun/SPARK-47335. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/.scalafmt.conf | 2 +- pom.xml| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/.scalafmt.conf b/dev/.scalafmt.conf index b3a43a03651a..6d1ab0243dc5 100644 --- a/dev/.scalafmt.conf +++ b/dev/.scalafmt.conf @@ -32,4 +32,4 @@ fileOverride { runner.dialect = scala213 } } -version = 3.7.17 +version = 3.8.0 diff --git a/pom.xml b/pom.xml index 146ded53dd8d..49a951405408 100644 --- a/pom.xml +++ b/pom.xml @@ -3564,7 +3564,7 @@ org.antipathy mvn-scalafmt_${scala.binary.version} -1.1.1640084764.9f463a9 +1.1.1684076452.9f83818 ${scalafmt.validateOnly} ${scalafmt.skip} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (f97da1638062 -> f40c693ad7fd)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f97da1638062 [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded Matchers trait in the test add f40c693ad7fd [SPARK-47339][BUILD] Upgrade checkStyle to `10.14.0` No new revisions were added by this update. Summary of changes: launcher/src/main/java/org/apache/spark/launcher/SparkAppHandle.java | 2 +- pom.xml | 2 +- project/plugins.sbt | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (610840e27e2e -> f97da1638062)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 610840e27e2e [SPARK-45827][SQL][FOLLOWUP] Fix for collation add f97da1638062 [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded Matchers trait in the test No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/api/python/PythonWorkerFactorySuite.scala | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (10be03215775 -> 610840e27e2e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 10be03215775 [SPARK-47255][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_323[6-7] and _LEGACY_ERROR_TEMP_324[7-9] add 610840e27e2e [SPARK-45827][SQL][FOLLOWUP] Fix for collation No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala | 2 +- .../sql/execution/datasources/SaveIntoDataSourceCommandSuite.scala | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-44115][BUILD] Upgrade Apache ORC to 2.0.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 76b1c122cb7d [SPARK-44115][BUILD] Upgrade Apache ORC to 2.0.0 76b1c122cb7d is described below commit 76b1c122cb7d77e8f175b25b935b9296a669d5d8 Author: Dongjoon Hyun AuthorDate: Fri Mar 8 13:31:10 2024 -0800 [SPARK-44115][BUILD] Upgrade Apache ORC to 2.0.0 ### What changes were proposed in this pull request? This PR aims to Upgrade Apache ORC to 2.0.0 for Apache Spark 4.0.0. Apache ORC community has 3-year support policy which is longer than Apache Spark. It's aligned like the following. - Apache ORC 2.0.x <-> Apache Spark 4.0.x - Apache ORC 1.9.x <-> Apache Spark 3.5.x - Apache ORC 1.8.x <-> Apache Spark 3.4.x - Apache ORC 1.7.x (Supported) <-> Apache Spark 3.3.x (End-Of-Support) ### Why are the changes needed? **Release Note** - https://github.com/apache/orc/releases/tag/v2.0.0 **Milestone** - https://github.com/apache/orc/milestone/20?closed=1 - https://github.com/apache/orc/pull/1728 - https://github.com/apache/orc/issues/1801 - https://github.com/apache/orc/issues/1498 - https://github.com/apache/orc/pull/1627 - https://github.com/apache/orc/issues/1497 - https://github.com/apache/orc/pull/1509 - https://github.com/apache/orc/pull/1554 - https://github.com/apache/orc/pull/1708 - https://github.com/apache/orc/pull/1733 - https://github.com/apache/orc/pull/1760 - https://github.com/apache/orc/pull/1743 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45443 from dongjoon-hyun/SPARK-44115. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 7 --- pom.xml | 17 - sql/core/pom.xml | 5 + 3 files changed, 25 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 7e56e8914435..6b357b3e4b70 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -227,9 +227,10 @@ opencsv/2.3//opencsv-2.3.jar opentracing-api/0.33.0//opentracing-api-0.33.0.jar opentracing-noop/0.33.0//opentracing-noop-0.33.0.jar opentracing-util/0.33.0//opentracing-util-0.33.0.jar -orc-core/1.9.2/shaded-protobuf/orc-core-1.9.2-shaded-protobuf.jar -orc-mapreduce/1.9.2/shaded-protobuf/orc-mapreduce-1.9.2-shaded-protobuf.jar -orc-shims/1.9.2//orc-shims-1.9.2.jar +orc-core/2.0.0/shaded-protobuf/orc-core-2.0.0-shaded-protobuf.jar +orc-format/1.0.0/shaded-protobuf/orc-format-1.0.0-shaded-protobuf.jar +orc-mapreduce/2.0.0/shaded-protobuf/orc-mapreduce-2.0.0-shaded-protobuf.jar +orc-shims/2.0.0//orc-shims-2.0.0.jar oro/2.0.8//oro-2.0.8.jar osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar paranamer/2.8//paranamer-2.8.jar diff --git a/pom.xml b/pom.xml index 9f1c9ed13f23..404f37be1b5a 100644 --- a/pom.xml +++ b/pom.xml @@ -141,7 +141,7 @@ 10.16.1.1 1.13.1 -1.9.2 +2.0.0 shaded-protobuf 11.0.20 5.0.0 @@ -2593,6 +2593,13 @@ + +org.apache.orc +orc-format +1.0.0 +${orc.classifier} +${orc.deps.scope} + org.apache.orc orc-core @@ -2600,6 +2607,14 @@ ${orc.classifier} ${orc.deps.scope} + +org.apache.orc +orc-format + + +com.aayushatharva.brotli4j +brotli4j + org.apache.hadoop hadoop-common diff --git a/sql/core/pom.xml b/sql/core/pom.xml index 0ad9e0f690c7..05f906206e5e 100644 --- a/sql/core/pom.xml +++ b/sql/core/pom.xml @@ -93,6 +93,11 @@ org.scala-lang.modules scala-parallel-collections_${scala.binary.version} + + org.apache.orc + orc-format + ${orc.classifier} + org.apache.orc orc-core - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (db0e5c7bc464 -> 35bced42474e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from db0e5c7bc464 [SPARK-47269][BUILD] Upgrade jetty to 11.0.20 add 35bced42474e [SPARK-47242][BUILD] Bump ap-loader 3.0(v8) to support for async-profiler 3.0 No new revisions were added by this update. Summary of changes: connector/profiler/pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (6b5917beff30 -> db0e5c7bc464)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6b5917beff30 [SPARK-46961][SS] Using ProcessorContext to store and retrieve handle add db0e5c7bc464 [SPARK-47269][BUILD] Upgrade jetty to 11.0.20 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (1cd7bab5c5c2 -> 22f9a5a25304)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1cd7bab5c5c2 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input add 22f9a5a25304 [SPARK-47235][CORE][TESTS] Disable `deleteRecursivelyUsingUnixNative` in Apple Silicon test env No new revisions were added by this update. Summary of changes: .../utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 9770016b180b [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input 9770016b180b is described below commit 9770016b180b0477060777d3739a2bfaabc6fcb3 Author: Dongjoon Hyun AuthorDate: Thu Feb 29 19:08:15 2024 -0800 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input ### What changes were proposed in this pull request? This PR aims to fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input. ### Why are the changes needed? `deleteRecursivelyUsingJavaIO` is a fallback of `deleteRecursivelyUsingUnixNative`. We should have identical capability. Currently, it fails. ``` [info] java.nio.file.NoSuchFileException: /Users/dongjoon/APACHE/spark-merge/target/tmp/spark-e264d853-42c0-44a2-9a30-22049522b04f [info] at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) [info] at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) [info] at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) [info] at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) [info] at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) [info] at java.base/java.nio.file.Files.readAttributes(Files.java:1851) [info] at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:126) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is difficult to test this `private static` Java method. I tested this with #45344 . ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45346 from dongjoon-hyun/SPARK-47236. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 1cd7bab5c5c2bd8d595b131c88e6576486dbf123) Signed-off-by: Dongjoon Hyun --- common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 + 1 file changed, 1 insertion(+) diff --git a/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java b/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java index bbe764b8366c..d6603dcbee1a 100644 --- a/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java +++ b/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java @@ -120,6 +120,7 @@ public class JavaUtils { private static void deleteRecursivelyUsingJavaIO( File file, FilenameFilter filter) throws IOException { +if (!file.exists()) return; BasicFileAttributes fileAttributes = Files.readAttributes(file.toPath(), BasicFileAttributes.class); if (fileAttributes.isDirectory() && !isSymlink(file)) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 58a4a49389a5 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input 58a4a49389a5 is described below commit 58a4a49389a5f9979f7dabc5320116a212eb4bdb Author: Dongjoon Hyun AuthorDate: Thu Feb 29 19:08:15 2024 -0800 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input ### What changes were proposed in this pull request? This PR aims to fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input. ### Why are the changes needed? `deleteRecursivelyUsingJavaIO` is a fallback of `deleteRecursivelyUsingUnixNative`. We should have identical capability. Currently, it fails. ``` [info] java.nio.file.NoSuchFileException: /Users/dongjoon/APACHE/spark-merge/target/tmp/spark-e264d853-42c0-44a2-9a30-22049522b04f [info] at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) [info] at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) [info] at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) [info] at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) [info] at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) [info] at java.base/java.nio.file.Files.readAttributes(Files.java:1851) [info] at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:126) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is difficult to test this `private static` Java method. I tested this with #45344 . ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45346 from dongjoon-hyun/SPARK-47236. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 1cd7bab5c5c2bd8d595b131c88e6576486dbf123) Signed-off-by: Dongjoon Hyun --- .../src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 + 1 file changed, 1 insertion(+) diff --git a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java index 7e410e9eab22..59744ec5748a 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java +++ b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java @@ -124,6 +124,7 @@ public class JavaUtils { private static void deleteRecursivelyUsingJavaIO( File file, FilenameFilter filter) throws IOException { +if (!file.exists()) return; BasicFileAttributes fileAttributes = Files.readAttributes(file.toPath(), BasicFileAttributes.class); if (fileAttributes.isDirectory() && !isSymlink(file)) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (cc0ea60d6eee -> 1cd7bab5c5c2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cc0ea60d6eee [SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer add 1cd7bab5c5c2 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input No new revisions were added by this update. Summary of changes: common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (70007c59177a -> 9ce43c85a5d2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 70007c59177a [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for docker ITs add 9ce43c85a5d2 [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the never changed `var` to `val` No new revisions were added by this update. Summary of changes: .../sql/connect/planner/SparkConnectServiceSuite.scala | 2 +- .../scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 2 +- .../spark/executor/CoarseGrainedExecutorBackend.scala| 2 +- core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala | 2 +- .../org/apache/spark/resource/ResourceProfileSuite.scala | 2 +- .../apache/spark/scheduler/TaskSchedulerImplSuite.scala | 2 +- .../apache/spark/shuffle/ShuffleBlockPusherSuite.scala | 2 +- .../main/scala/org/apache/spark/deploy/yarn/Client.scala | 2 +- .../apache/spark/sql/catalyst/parser/AstBuilder.scala| 2 +- .../spark/sql/catalyst/analysis/AnalysisErrorSuite.scala | 8 .../catalyst/expressions/StringExpressionsSuite.scala| 16 .../spark/sql/execution/streaming/state/RocksDB.scala| 2 +- .../scala/org/apache/spark/sql/ConfigBehaviorSuite.scala | 2 +- .../datasources/parquet/ParquetVectorizedSuite.scala | 2 +- .../sql/hive/thriftserver/HiveThriftServer2Suites.scala | 2 +- 15 files changed, 25 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (813934c69df6 -> 70007c59177a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 813934c69df6 [SPARK-47015][SQL] Disable partitioning on collated columns add 70007c59177a [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for docker ITs No new revisions were added by this update. Summary of changes: .../sql/jdbc/DockerJDBCIntegrationSuite.scala | 39 -- 1 file changed, 14 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 28fd3de0fea0 [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 28fd3de0fea0 is described below commit 28fd3de0fea0e952aa1494838d00185613389277 Author: yangjie01 AuthorDate: Thu Feb 29 07:56:29 2024 -0800 [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 ### What changes were proposed in this pull request? Adds bouncy-castle jdk18 artifacts to test builds in spark-sql. Based on #38974 * only applies the test import changes * dependencies are those of #44359 ### Why are the changes needed? Forthcoming Hadoop 3.4.0 release doesn't export the bouncy-castle JARs; maven builds fail. ### Does this PR introduce _any_ user-facing change? No: test time dependency declarations only. ### How was this patch tested? This was done through the release build/test project https://github.com/apache/hadoop-release-support 1. Latest RC2 artifacts pulled from apache maven staging 2. Spark maven build triggered with the hadoop-version passed down. 3. The 3.3.6 release template worked with spark master (as it should!) 4. With this change the 3.4.0 RC build worked with this change Note: have not *yet* done a maven test run through this yet ### Was this patch authored or co-authored using generative AI tooling? No Closes #45317 from steveloughran/SPARK-41392-HADOOP-3.4.0. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- sql/core/pom.xml | 12 1 file changed, 12 insertions(+) diff --git a/sql/core/pom.xml b/sql/core/pom.xml index 8b1b51352a20..0ad9e0f690c7 100644 --- a/sql/core/pom.xml +++ b/sql/core/pom.xml @@ -223,6 +223,18 @@ htmlunit3-driver test + + + org.bouncycastle + bcprov-jdk18on + test + + + org.bouncycastle + bcpkix-jdk18on + test + target/scala-${scala.binary.version}/classes - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (a7825f6e8907 -> 919c19c008b8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a7825f6e8907 [SPARK-47227][DOCS] Improve documentation for Spark Connect add 919c19c008b8 [SPARK-47231][CORE][TESTS] FakeTask should reference its TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/scheduler/FakeTask.scala| 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (944a00db6f83 -> a7825f6e8907)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 944a00db6f83 [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv` add a7825f6e8907 [SPARK-47227][DOCS] Improve documentation for Spark Connect No new revisions were added by this update. Summary of changes: docs/spark-connect-overview.md | 20 1 file changed, 20 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47215][CORE][TESTS] Reduce the number of required threads in `MasterSuite`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new da2ae29c9cab [SPARK-47215][CORE][TESTS] Reduce the number of required threads in `MasterSuite` da2ae29c9cab is described below commit da2ae29c9cabe336f95ab3737e97aa8a5bd33ada Author: Dongjoon Hyun AuthorDate: Wed Feb 28 16:21:26 2024 -0800 [SPARK-47215][CORE][TESTS] Reduce the number of required threads in `MasterSuite` ### What changes were proposed in this pull request? This PR aims to reduce the umber of required threads in `MasterSuite` test. ### Why are the changes needed? - https://github.com/apache/spark/actions/workflows/build_maven_java21_macos14.yml - https://github.com/apache/spark/actions/runs/8070641575/job/22086547398 ``` - SPARK-46881: scheduling with workerSelectionPolicy - CORES_FREE_ASC (true) - SPARK-46881: scheduling with workerSelectionPolicy - CORES_FREE_ASC (false) Warning: [3943.730s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k, guardsize: 16k, detached. Warning: [3943.730s][warning][os,thread] Failed to start the native thread for java.lang.Thread "rpc-server-13566-3" Warning: [3943.730s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k, guardsize: 16k, detached. Warning: [3943.730s][warning][os,thread] Failed to start the native thread for java.lang.Thread "globalEventExecutor-3-961" *** RUN ABORTED *** An exception or error caused a run to abort: unable to create native thread: possibly out of memory or process/resource limits reached java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45320 from dongjoon-hyun/SPARK-47215. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/deploy/master/MasterSuite.scala | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala b/core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala index 9992c2020f27..b4981ca3d9c6 100644 --- a/core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala @@ -687,9 +687,9 @@ class MasterSuite extends SparkFunSuite private val workerSelectionPolicyTestCases = Seq( (CORES_FREE_ASC, true, List("10001", "10002")), (CORES_FREE_ASC, false, List("10001")), -(CORES_FREE_DESC, true, List("10004", "10005")), -(CORES_FREE_DESC, false, List("10005")), -(MEMORY_FREE_ASC, true, List("10001", "10005")), +(CORES_FREE_DESC, true, List("10002", "10003")), +(CORES_FREE_DESC, false, List("10003")), +(MEMORY_FREE_ASC, true, List("10001", "10003")), (MEMORY_FREE_ASC, false, List("10001")), (MEMORY_FREE_DESC, true, List("10002", "10003")), (MEMORY_FREE_DESC, false, Seq("10002")), @@ -701,11 +701,14 @@ class MasterSuite extends SparkFunSuite val conf = new SparkConf() .set(WORKER_SELECTION_POLICY.key, policy.toString) .set(SPREAD_OUT_APPS.key, spreadOut.toString) +.set(UI_ENABLED.key, "false") +.set(Network.RPC_NETTY_DISPATCHER_NUM_THREADS, 1) +.set(Network.RPC_IO_THREADS, 1) val master = makeAliveMaster(conf) // Use different core and memory values to simplify the tests MockWorker.counter.set(1) - (1 to 5).foreach { idx => + (1 to 3).foreach { idx => val worker = new MockWorker(master.self, conf) worker.rpcEnv.setupEndpoint(s"worker-$idx", worker) val workerReg = RegisterWorker( @@ -713,7 +716,7 @@ class MasterSuite extends SparkFunSuite "localhost", worker.self.address.port, worker.self, - idx * 10, + 4 + idx, 10240 * (if (idx < 2) idx else (6 - idx)), "http://localhost:8080";, RpcAddress("localhost", 1)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47207][CORE] Support `spark.driver.timeout` and `DriverTimeoutPlugin`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2a93e46eb062 [SPARK-47207][CORE] Support `spark.driver.timeout` and `DriverTimeoutPlugin` 2a93e46eb062 is described below commit 2a93e46eb0627df9cd288156bffa0a0815906c3c Author: Dongjoon Hyun AuthorDate: Wed Feb 28 09:27:53 2024 -0800 [SPARK-47207][CORE] Support `spark.driver.timeout` and `DriverTimeoutPlugin` ### What changes were proposed in this pull request? This PR aims to support `spark.driver.timeout` and `DriverTimeoutPlugin`. ### Why are the changes needed? Sometime, Spark applications fall into abnormal situation and hang. We had better provide a way to guarantee the termination after pre-defined timeout via a standard way. - spark.plugins=org.apache.spark.deploy.DriverTimeoutPlugin - spark.driver.timeout=1min ``` $ bin/spark-shell -c spark.plugins=org.apache.spark.deploy.DriverTimeoutPlugin -c spark.driver.timeout=1min Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.10) Type in expressions to have them evaluated. Type :help for more information. 24/02/28 06:53:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1709132014477). Spark session available as 'spark'. scala> 24/02/28 06:54:34 WARN DriverTimeoutDriverPlugin: Terminate Driver JVM because it runs after 1 minute $ echo $? 124 ``` ### Does this PR introduce _any_ user-facing change? No, this is a new feature and a built-in plugin. ### How was this patch tested? Manually because this invokes `System.exit`. 1. Timeout with 1 minute ``` $ bin/spark-shell -c spark.plugins=org.apache.spark.deploy.DriverTimeoutPlugin -c spark.driver.timeout=1min ... scala> 24/02/28 06:54:34 WARN DriverTimeoutDriverPlugin: Terminate Driver JVM because it runs after 1 minute $ echo $? 124 ``` 2. `DriverTimeoutPlugin` will be ignored if the default value of `spark.driver.timeout` is used. ``` $ bin/spark-shell -c spark.plugins=org.apache.spark.deploy.DriverTimeoutPlugin ... 24/02/28 01:02:57 WARN DriverTimeoutDriverPlugin: Disabled with the timeout value 0. ... scala> ``` 3. `spark.driver.timeout` will be ignored if `DriverTimeoutPlugin` is not provided. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45313 from dongjoon-hyun/SPARK-47207. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../apache/spark/deploy/DriverTimeoutPlugin.scala | 62 ++ .../org/apache/spark/internal/config/package.scala | 9 .../org/apache/spark/util/SparkExitCode.scala | 3 ++ docs/configuration.md | 11 4 files changed, 85 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/DriverTimeoutPlugin.scala b/core/src/main/scala/org/apache/spark/deploy/DriverTimeoutPlugin.scala new file mode 100644 index ..9b141d607572 --- /dev/null +++ b/core/src/main/scala/org/apache/spark/deploy/DriverTimeoutPlugin.scala @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy + +import java.util.{Map => JMap} +import java.util.concurrent.{ScheduledExecutorService, TimeUnit} + +import scala.jdk.Col
(spark) branch master updated (7e7ba4eaf071 -> ea2587f695cf)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7e7ba4eaf071 [MINOR][SQL] Remove out-of-dated comment in `CollectLimitExec` add ea2587f695cf [SPARK-47209][BUILD] Upgrade slf4j to 2.0.12 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++--- pom.xml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47199][PYTHON][TESTS] Add prefix into TemporaryDirectory to avoid flakiness
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6eed08cb0c12 [SPARK-47199][PYTHON][TESTS] Add prefix into TemporaryDirectory to avoid flakiness 6eed08cb0c12 is described below commit 6eed08cb0c12c46b9de4665ab130cea1695b9a5b Author: Hyukjin Kwon AuthorDate: Tue Feb 27 23:11:25 2024 -0800 [SPARK-47199][PYTHON][TESTS] Add prefix into TemporaryDirectory to avoid flakiness ### What changes were proposed in this pull request? This PR proposes to set `prefix` for `TemporaryDirectory` to deflake the tests. Sometimes the test fail because the temporary directory names are same (https://github.com/apache/spark/actions/runs/8066850485/job/22036007390). ``` File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line ?, in pyspark.sql.dataframe.DataFrame.writeStream Failed example: with tempfile.TemporaryDirectory() as d: # Create a table with Rate source. df.writeStream.toTable( "my_table", checkpointLocation=d) Exception raised: Traceback (most recent call last): File "/usr/lib/python3.11/doctest.py", line 1353, in __run exec(compile(example.source, filename, "single", File "", line 1, in with tempfile.TemporaryDirectory() as d: File "/usr/lib/python3.11/tempfile.py", line 1043, in __exit__ self.cleanup() File "/usr/lib/python3.11/tempfile.py", line 1047, in cleanup self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors) File "/usr/lib/python3.11/tempfile.py", line 1029, in _rmtree _rmtree(name, onerror=onerror) File "/usr/lib/python3.11/shutil.py", line 738, in rmtree onerror(os.rmdir, path, sys.exc_info()) File "/usr/lib/python3.11/shutil.py", line 736, in rmtree os.rmdir(path, dir_fd=dir_fd) OSError: [Errno 39] Directory not empty: '/__w/spark/spark/python/target/4f062b09-213f-4ac2-a10a-2d704990141b/tmp29irqweq' ``` ### Why are the changes needed? To make the tests more robust. ### Does this PR introduce _any_ user-facing change? No, test-only. There's a bit of user-facing documentation change but pretty trivial. ### How was this patch tested? Manually tested. CI in this PR should test them out as well. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45298 from HyukjinKwon/SPARK-47199. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- dev/connect-check-protos.py| 2 +- python/pyspark/broadcast.py| 6 +- python/pyspark/context.py | 26 python/pyspark/files.py| 2 +- .../connect/test_legacy_mode_classification.py | 2 +- .../tests/connect/test_legacy_mode_evaluation.py | 6 +- .../ml/tests/connect/test_legacy_mode_feature.py | 6 +- .../ml/tests/connect/test_legacy_mode_pipeline.py | 2 +- .../ml/tests/connect/test_legacy_mode_tuning.py| 2 +- python/pyspark/ml/tests/test_als.py| 2 +- python/pyspark/rdd.py | 18 +++--- python/pyspark/sql/catalog.py | 8 +-- python/pyspark/sql/dataframe.py| 6 +- python/pyspark/sql/protobuf/functions.py | 4 +- python/pyspark/sql/readwriter.py | 73 +++--- python/pyspark/sql/session.py | 2 +- python/pyspark/sql/streaming/readwriter.py | 50 --- .../sql/tests/connect/client/test_artifact.py | 16 ++--- .../sql/tests/connect/test_connect_basic.py| 20 +++--- .../pyspark/sql/tests/streaming/test_streaming.py | 2 +- python/pyspark/sql/tests/test_catalog.py | 2 +- python/pyspark/sql/tests/test_python_datasource.py | 6 +- python/pyspark/sql/tests/test_udf_profiler.py | 4 +- python/pyspark/sql/tests/test_udtf.py | 16 ++--- python/pyspark/tests/test_install_spark.py | 2 +- python/pyspark/tests/test_memory_profiler.py | 4 +- python/pyspark/tests/test_profiler.py | 2 +- python/pyspark/tests/test_shuffle.py | 10 +-- python/pyspark/util.py | 2 +- 29 files changed, 154 insertions(+), 149 deletions(-) diff --git a/dev/connect-check-protos.py b/dev/connect-check-protos.py index 513938f8d4f8..ffc74d7b1608 100755 --- a/dev/connect-check-protos.py +++ b/dev/connect-check-protos.py @@ -45,7 +45,7 @@ def run_cmd(
(spark) branch branch-3.5 updated (cbf25fb633f4 -> b4118e0dbb50)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git from cbf25fb633f4 Revert "[SPARK-45599][CORE] Use object equality in OpenHashSet" add b4118e0dbb50 [SPARK-45599][CORE][3.5] Use object equality in OpenHashSet No new revisions were added by this update. Summary of changes: .../apache/spark/util/collection/OpenHashSet.scala | 16 +++-- .../spark/util/collection/OpenHashMapSuite.scala | 30 + .../spark/util/collection/OpenHashSetSuite.scala | 39 ++ .../sql-tests/analyzer-results/ansi/array.sql.out | 14 .../analyzer-results/ansi/literals.sql.out | 7 .../sql-tests/analyzer-results/array.sql.out | 14 .../sql-tests/analyzer-results/group-by.sql.out| 19 +++ .../sql-tests/analyzer-results/literals.sql.out| 7 .../src/test/resources/sql-tests/inputs/array.sql | 4 +++ .../test/resources/sql-tests/inputs/group-by.sql | 15 + .../test/resources/sql-tests/inputs/literals.sql | 3 ++ .../resources/sql-tests/results/ansi/array.sql.out | 16 + .../sql-tests/results/ansi/literals.sql.out| 8 + .../test/resources/sql-tests/results/array.sql.out | 16 + .../resources/sql-tests/results/group-by.sql.out | 22 .../resources/sql-tests/results/literals.sql.out | 8 + .../apache/spark/sql/DataFrameAggregateSuite.scala | 33 ++ 17 files changed, 268 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47196][CORE][BUILD][3.4] Fix `core` module to succeed SBT tests
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 5ce628f803d3 [SPARK-47196][CORE][BUILD][3.4] Fix `core` module to succeed SBT tests 5ce628f803d3 is described below commit 5ce628f803d3253ab5f6e97ab4572d73b79f1fd8 Author: Dongjoon Hyun AuthorDate: Tue Feb 27 18:22:08 2024 -0800 [SPARK-47196][CORE][BUILD][3.4] Fix `core` module to succeed SBT tests ### What changes were proposed in this pull request? This PR aims to fix `core` module to succeed SBT tests by preserving `mockito-core`'s `byte-buddy` test dependency. Currently, `Maven` respects `mockito-core`'s byte-buddy dependency while SBT doesn't. **MAVEN** ``` $ build/mvn dependency:tree -pl core | grep byte-buddy ... [INFO] | +- net.bytebuddy:byte-buddy:jar:1.12.10:test [INFO] | +- net.bytebuddy:byte-buddy-agent:jar:1.12.10:test ``` **SBT** ``` $ build/sbt "core/test:dependencyTree" | grep byte-buddy ... [info] | | | | +-net.bytebuddy:byte-buddy:1.12.10 (evicted by: 1.12.18) [info] | | | | +-net.bytebuddy:byte-buddy:1.12.18 ... ``` Note that this happens at `branch-3.4` from Apache Spark 3.4.0~3.4.2 only. branch-3.3/branch-3.5/master are okay. ### Why are the changes needed? **BEFORE** ``` $ build/sbt "core/testOnly *.DAGSchedulerSuite" [info] DAGSchedulerSuite: [info] - [SPARK-3353] parent stage should have lower stage id *** FAILED *** (439 milliseconds) [info] java.lang.IllegalStateException: Could not initialize plugin: interface org.mockito.plugins.MockMaker (alternate: null) ... [info] *** 1 SUITE ABORTED *** [info] *** 118 TESTS FAILED *** [error] Error during tests: [error] org.apache.spark.scheduler.DAGSchedulerSuite [error] (core / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 48 s, completed Feb 27, 2024, 1:26:27 PM ``` **AFTER** ``` $ build/sbt "core/testOnly *.DAGSchedulerSuite" ... [info] All tests passed. [success] Total time: 22 s, completed Feb 27, 2024, 1:24:34 PM ``` ### Does this PR introduce _any_ user-facing change? No, this is a test-only fix. ### How was this patch tested? Pass the CIs and manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45295 from dongjoon-hyun/SPARK-47196. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- pom.xml | 12 1 file changed, 12 insertions(+) diff --git a/pom.xml b/pom.xml index 26f0b71a5114..373d17b76c09 100644 --- a/pom.xml +++ b/pom.xml @@ -423,6 +423,12 @@ org.scalatestplus selenium-4-7_${scala.binary.version} test + + + net.bytebuddy + byte-buddy + + junit @@ -725,6 +731,12 @@ htmlunit-driver ${htmlunit-driver.version} test + + +net.bytebuddy +byte-buddy + + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: Revert "[SPARK-45599][CORE] Use object equality in OpenHashSet"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new cbf25fb633f4 Revert "[SPARK-45599][CORE] Use object equality in OpenHashSet" cbf25fb633f4 is described below commit cbf25fb633f4bf2f83a6f6e39aafaa80bf47e160 Author: Dongjoon Hyun AuthorDate: Tue Feb 27 08:38:54 2024 -0800 Revert "[SPARK-45599][CORE] Use object equality in OpenHashSet" This reverts commit 588a55d010fefda7a63cde3b616ac38728fe4cfe. --- .../apache/spark/util/collection/OpenHashSet.scala | 16 ++--- .../spark/util/collection/OpenHashMapSuite.scala | 30 - .../spark/util/collection/OpenHashSetSuite.scala | 39 -- .../sql-tests/analyzer-results/ansi/array.sql.out | 14 .../analyzer-results/ansi/literals.sql.out | 7 .../sql-tests/analyzer-results/array.sql.out | 14 .../sql-tests/analyzer-results/group-by.sql.out| 19 --- .../sql-tests/analyzer-results/literals.sql.out| 7 .../src/test/resources/sql-tests/inputs/array.sql | 4 --- .../test/resources/sql-tests/inputs/group-by.sql | 15 - .../test/resources/sql-tests/inputs/literals.sql | 3 -- .../resources/sql-tests/results/ansi/array.sql.out | 16 - .../sql-tests/results/ansi/literals.sql.out| 8 - .../test/resources/sql-tests/results/array.sql.out | 16 - .../resources/sql-tests/results/group-by.sql.out | 22 .../resources/sql-tests/results/literals.sql.out | 8 - .../apache/spark/sql/DataFrameAggregateSuite.scala | 33 -- 17 files changed, 3 insertions(+), 268 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala b/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala index 435cf1a03cbc..6815e47a198d 100644 --- a/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala +++ b/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala @@ -126,17 +126,6 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag]( this } - /** - * Check if a key exists at the provided position using object equality rather than - * cooperative equality. Otherwise, hash sets will mishandle values for which `==` - * and `equals` return different results, like 0.0/-0.0 and NaN/NaN. - * - * See: https://issues.apache.org/jira/browse/SPARK-45599 - */ - @annotation.nowarn("cat=other-non-cooperative-equals") - private def keyExistsAtPos(k: T, pos: Int) = -_data(pos) equals k - /** * Add an element to the set. This one differs from add in that it doesn't trigger rehashing. * The caller is responsible for calling rehashIfNeeded. @@ -157,7 +146,8 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag]( _bitset.set(pos) _size += 1 return pos | NONEXISTENCE_MASK - } else if (keyExistsAtPos(k, pos)) { + } else if (_data(pos) == k) { +// Found an existing key. return pos } else { // quadratic probing with values increase by 1, 2, 3, ... @@ -191,7 +181,7 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag]( while (true) { if (!_bitset.get(pos)) { return INVALID_POS - } else if (keyExistsAtPos(k, pos)) { + } else if (k == _data(pos)) { return pos } else { // quadratic probing with values increase by 1, 2, 3, ... diff --git a/core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala b/core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala index f7b026ab565f..1af99e9017c9 100644 --- a/core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala @@ -249,34 +249,4 @@ class OpenHashMapSuite extends SparkFunSuite with Matchers { map(null) = null assert(map.get(null) === Some(null)) } - - test("SPARK-45599: 0.0 and -0.0 should count distinctly; NaNs should count together") { -// Exactly these elements provided in roughly this order trigger a condition where lookups of -// 0.0 and -0.0 in the bitset happen to collide, causing their counts to be merged incorrectly -// and inconsistently if `==` is used to check for key equality. -val spark45599Repro = Seq( - Double.NaN, - 2.0, - 168.0, - Double.NaN, - Double.NaN, - -0.0, - 153.0, - 0.0 -) - -val map1 = new OpenHashMap[Double, Int]() -spark45599Repro.foreach(map1.changeValue(_, 1, {_ + 1})) -assert(map1(0.0) == 1) -assert(map1(-0.0) == 1) -assert(map1(Double.NaN) == 3) - -val map2 = new OpenHashM
(spark) branch master updated: [SPARK-47185][SS][TESTS] Increase timeout between actions in KafkaContinuousSourceSuite
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ba773fe9a286 [SPARK-47185][SS][TESTS] Increase timeout between actions in KafkaContinuousSourceSuite ba773fe9a286 is described below commit ba773fe9a28640af6d2ddebd8104b05e28778f58 Author: Hyukjin Kwon AuthorDate: Tue Feb 27 07:36:08 2024 -0800 [SPARK-47185][SS][TESTS] Increase timeout between actions in KafkaContinuousSourceSuite ### What changes were proposed in this pull request? This PR proposes to increase the timeout between between actions in `KafkaContinuousSourceSuite`. ### Why are the changes needed? In Mac OS build, those tests fail indeterministically, see - https://github.com/apache/spark/actions/runs/8054862135/job/22000404856 - https://github.com/apache/spark/actions/runs/8040413156/job/21958488693 - https://github.com/apache/spark/actions/runs/8032862212/job/21942732320 - https://github.com/apache/spark/actions/runs/8024427919/job/21937366481 `KafkaContinuousSourceSuite` is specifically slow in Mac OS. Kafka producers send the messages correctly, but the consumers can't get the messages for some reasons. You can't get the offsets for long time. This is not an issue in micro batch but I fail to identify the difference. I just decided to increase the timeout between actions for now. This is more just a workaround. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually tested in my Mac. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45283 from HyukjinKwon/SPARK-47185. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala index e42662c7a62b..fa1db6bfaccc 100644 --- a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala +++ b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.kafka010 import org.apache.kafka.clients.producer.ProducerRecord +import org.scalatest.time.SpanSugar._ import org.apache.spark.sql.Dataset import org.apache.spark.sql.execution.datasources.v2.ContinuousScanExec @@ -28,6 +29,8 @@ import org.apache.spark.sql.streaming.Trigger class KafkaContinuousSourceSuite extends KafkaSourceSuiteBase with KafkaContinuousTest { import testImplicits._ + override val streamingTimeout = 60.seconds + test("read Kafka transactional messages: read_committed") { val table = "kafka_continuous_source_test" withTable(table) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47182][BUILD] Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro*`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 209d0fcf22b1 [SPARK-47182][BUILD] Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro*` 209d0fcf22b1 is described below commit 209d0fcf22b174c308d2ae239795d6193e2ca85e Author: Dongjoon Hyun AuthorDate: Mon Feb 26 22:57:01 2024 -0800 [SPARK-47182][BUILD] Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro*` ### Why are the changes needed? This PR aims to exclude `commons-(io|lang3)` transitive dependencies from `commons-compress`, `avro`, and `avro-mapred` dependencies. ### Does this PR introduce _any_ user-facing change? Apache Spark define and use our own versions. The exclusion of the transitive dependencies will clarify that. https://github.com/apache/spark/blob/1a408033daf458f1ceebbe14a560355a1a2c0a70/pom.xml#L198 https://github.com/apache/spark/blob/1a408033daf458f1ceebbe14a560355a1a2c0a70/pom.xml#L194 ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45278 from dongjoon-hyun/SPARK-47182. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- pom.xml | 28 1 file changed, 28 insertions(+) diff --git a/pom.xml b/pom.xml index 22606caaf65c..8e977395378c 100644 --- a/pom.xml +++ b/pom.xml @@ -619,6 +619,16 @@ org.apache.commons commons-compress ${commons-compress.version} + + +commons-io +commons-io + + +org.apache.commons +commons-lang3 + + org.apache.commons @@ -1484,6 +1494,16 @@ org.apache.avro avro ${avro.version} + + +commons-io +commons-io + + +org.apache.commons +commons-lang3 + + org.apache.avro @@ -1523,6 +1543,14 @@ com.github.luben zstd-jni + +commons-io +commons-io + + +org.apache.commons +commons-lang3 +
(spark) branch master updated (031b90b2ac0b -> 1a408033daf4)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 031b90b2ac0b [SPARK-47178][PYTHON][TESTS] Add a test case for createDataFrame with dataclasses add 1a408033daf4 [SPARK-47181][CORE][TESTS] Fix `MasterSuite` to validate the number of registered workers No new revisions were added by this update. Summary of changes: .../src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-45527][CORE][TESTS][FOLLOW-UP] Reduce the number of test cases in fraction resource calculation
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new afa9f9679bc0 [SPARK-45527][CORE][TESTS][FOLLOW-UP] Reduce the number of test cases in fraction resource calculation afa9f9679bc0 is described below commit afa9f9679bc01e8afbf7e4a47c203bfcc1a0652a Author: Hyukjin Kwon AuthorDate: Mon Feb 26 18:54:07 2024 -0800 [SPARK-45527][CORE][TESTS][FOLLOW-UP] Reduce the number of test cases in fraction resource calculation ### What changes were proposed in this pull request? There are two more instances to fix in https://github.com/apache/spark/pull/45268 mistakenly missed. This PR fixes both. ### Why are the changes needed? See https://github.com/apache/spark/pull/45268 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45272 from HyukjinKwon/SPARK-45527-followup2. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala index 3248e64bcc58..df5031e05887 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala @@ -2374,7 +2374,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext // 1 executor with 4 GPUS Seq(true, false).foreach { barrierMode => val barrier = if (barrierMode) "barrier" else "" -(1 to 20).foreach { taskNum => +scala.util.Random.shuffle((1 to 20).toList).take(5).foreach { taskNum => val gpuTaskAmount = ResourceAmountUtils.toFractionalResource(ONE_ENTIRE_RESOURCE / taskNum) test(s"SPARK-45527 TaskResourceProfile with task.gpu.amount=${gpuTaskAmount} can " + s"restrict $taskNum $barrier tasks run in the same executor") { @@ -2423,7 +2423,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext // 4 executors, each of which has 1 GPU Seq(true, false).foreach { barrierMode => val barrier = if (barrierMode) "barrier" else "" -(1 to 20).foreach { taskNum => +scala.util.Random.shuffle((1 to 20).toList).take(5).foreach { taskNum => val gpuTaskAmount = ResourceAmountUtils.toFractionalResource(ONE_ENTIRE_RESOURCE / taskNum) test(s"SPARK-45527 TaskResourceProfile with task.gpu.amount=${gpuTaskAmount} can " + s"restrict $taskNum $barrier tasks run on the different executor") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-45527][CORE][TESTS][FOLLOWUP] Reduce the number of threads from 1k to 100 in `TaskSchedulerImplSuite`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 76c4fd56c5a5 [SPARK-45527][CORE][TESTS][FOLLOWUP] Reduce the number of threads from 1k to 100 in `TaskSchedulerImplSuite` 76c4fd56c5a5 is described below commit 76c4fd56c5a53bf9f726820a44ca0f610f7b91f6 Author: Dongjoon Hyun AuthorDate: Mon Feb 26 14:32:10 2024 -0800 [SPARK-45527][CORE][TESTS][FOLLOWUP] Reduce the number of threads from 1k to 100 in `TaskSchedulerImplSuite` ### What changes were proposed in this pull request? This PR is a follow-up of #43494 in order to reduce the number of threads of SparkContext from 1k to 100 in the test environment. ### Why are the changes needed? To reduce the test resource requirement. 1000 threads seem to be too large for some CI systems with a limited resource. - https://github.com/apache/spark/actions/workflows/build_maven_java21_macos14.yml - https://github.com/apache/spark/actions/runs/8054862135/job/22000403549 ``` Warning: [766.327s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k, guardsize: 16k, detached. Warning: [766.327s][warning][os,thread] Failed to start the native thread for java.lang.Thread "dispatcher-event-loop-840" *** RUN ABORTED *** An exception or error caused a run to abort: unable to create native thread: possibly out of memory or process/resource limits reached java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached ``` ### Does this PR introduce _any_ user-facing change? No, this is a test-case update. ### How was this patch tested? Pass the CIs and monitor Daily Apple Silicon test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45264 from dongjoon-hyun/SPARK-45527. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala index 3e43442583ec..f7b868c66468 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala @@ -2489,7 +2489,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext val taskCpus = 1 val taskGpus = 0.3 val executorGpus = 4 -val executorCpus = 1000 +val executorCpus = 100 // each tasks require 0.3 gpu val taskScheduler = setupScheduler(numCores = executorCpus, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (298134fd5e98 -> a939a7d0fd9c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 298134fd5e98 [SPARK-47009][SQL] Enable create table support for collation add a939a7d0fd9c [SPARK-47170][BUILD][CONNECT] Remove `jakarta.servlet-api` and `javax.servlet-api` dependency scope in `connect/server` module No new revisions were added by this update. Summary of changes: connector/connect/server/pom.xml | 2 -- 1 file changed, 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47163][BUILD] Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 242dd2e819b4 [SPARK-47163][BUILD] Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first 242dd2e819b4 is described below commit 242dd2e819b4512fd46e6d02b1bd0f937ad5419d Author: Dongjoon Hyun AuthorDate: Sun Feb 25 23:54:21 2024 -0800 [SPARK-47163][BUILD] Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first ### What changes were proposed in this pull request? This PR aims to fix `make-distribution.sh` script to check `jackson-*-asl-*.jar` existence first before copying. ### Why are the changes needed? Currently, `make-distribution.sh` script fails if it builds without `hive-thriftserver`. ### Does this PR introduce _any_ user-facing change? No, this bug is introduced by unreleased feature. ### How was this patch tested? Pass the CIs and manually build without Hive like the following. ``` $ dev/make-distribution.sh $ ls dist/ LICENSENOTICE README.md RELEASEbinconf data examples jars kubernetes licenses python sbin ``` ``` $ dev/make-distribution.sh -Phive-thriftserver $ ls dist LICENSE NOTICE README.mdRELEASE bin conf data examples hive-jackson jars licenses python sbin ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #45253 from dongjoon-hyun/SPARK-47163. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/make-distribution.sh | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh index 5c4c36df37a6..70684a02a8dd 100755 --- a/dev/make-distribution.sh +++ b/dev/make-distribution.sh @@ -190,10 +190,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE" cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/" # Only create the hive-jackson directory if they exist. -for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do - mkdir -p "$DISTDIR"/hive-jackson - mv $f "$DISTDIR"/hive-jackson/ -done +if [ -f "$DISTDIR"/jars/jackson-core-asl-1.9.13.jar ]; then + for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do +mkdir -p "$DISTDIR"/hive-jackson +mv $f "$DISTDIR"/hive-jackson/ + done +fi # Only create the yarn directory if the yarn artifacts were built. if [ -f "$SPARK_HOME"/common/network-yarn/target/scala*/spark-*-yarn-shuffle.jar ]; then - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47160][K8S] Update K8s `Dockerfile` to include `hive-jackson` directory if exists
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5c5b47c5ad51 [SPARK-47160][K8S] Update K8s `Dockerfile` to include `hive-jackson` directory if exists 5c5b47c5ad51 is described below commit 5c5b47c5ad51131fd92a5682140481361b023d51 Author: Dongjoon Hyun AuthorDate: Sun Feb 25 23:21:49 2024 -0800 [SPARK-47160][K8S] Update K8s `Dockerfile` to include `hive-jackson` directory if exists ### What changes were proposed in this pull request? This PR aims to update K8s `Dockerfile` to include `hive-jackson` jar directory if exists. ### Why are the changes needed? After SPARK-47152, we can have `hive-jackson` directory. ### Does this PR introduce _any_ user-facing change? No, this is used by Spark internal by default. ### How was this patch tested? Pass the CIs and manual check. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45251 from dongjoon-hyun/SPARK-47160. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../kubernetes/docker/src/main/dockerfiles/spark/Dockerfile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile index 25d7e076169b..421639cf2880 100644 --- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile +++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile @@ -42,6 +42,8 @@ RUN set -ex && \ rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/* COPY jars /opt/spark/jars +# Copy hive-jackson directory if exists +COPY hive-jackso[n] /opt/spark/hive-jackson # Copy RELEASE file if exists COPY RELEAS[E] /opt/spark/RELEASE COPY bin /opt/spark/bin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47161][INFRA][R] Uses hash key properly for SparkR build on Windows
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1ca60d353792 [SPARK-47161][INFRA][R] Uses hash key properly for SparkR build on Windows 1ca60d353792 is described below commit 1ca60d353792d2f823a35354376d34797bfc60c2 Author: Hyukjin Kwon AuthorDate: Sun Feb 25 23:17:33 2024 -0800 [SPARK-47161][INFRA][R] Uses hash key properly for SparkR build on Windows ### What changes were proposed in this pull request? This PR fixes the mistake in https://github.com/apache/spark/pull/45175 that sets the hash key wrongly for Maven cache. ### Why are the changes needed? To use the cache properly. SparkR on Windows does not find its cache properly: https://github.com/apache/spark/actions/runs/8039485831/job/2195633 ![Screenshot 2024-02-26 at 2 48 07 PM](https://github.com/apache/spark/assets/6477701/1c151c04-c07c-4968-af3a-b745cc7af391) ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Will monitor the CI. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45252 from HyukjinKwon/SPARK-47161. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- .github/workflows/build_sparkr_window.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/build_sparkr_window.yml b/.github/workflows/build_sparkr_window.yml index a7a265965662..155422d22e03 100644 --- a/.github/workflows/build_sparkr_window.yml +++ b/.github/workflows/build_sparkr_window.yml @@ -42,7 +42,7 @@ jobs: uses: actions/cache@v4 with: path: ~/.m2/repository -key: build-sparkr-maven-${{ hashFiles('**/pom.xml') }} +key: build-sparkr-windows-maven-${{ hashFiles('**/pom.xml') }} restore-keys: | build-sparkr-windows-maven- - name: Install Java 17 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (679b468854ab -> 0ff18e579c2f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 679b468854ab [MINOR][CONNECT][TESTS] Chain waitFor after destroyForcibly in SparkConnectServerUtils add 0ff18e579c2f [SPARK-46802][PYTHON][TESTS][FOLLOWUP] Remove obsolete comment in run-tests-with-coverage No new revisions were added by this update. Summary of changes: python/run-tests-with-coverage | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47154][SS][TESTS] Fix `kafka-0-10-sql` to use `ResetSystemProperties` if `KafkaTestUtils` is used
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 18b86068ff4c [SPARK-47154][SS][TESTS] Fix `kafka-0-10-sql` to use `ResetSystemProperties` if `KafkaTestUtils` is used 18b86068ff4c is described below commit 18b86068ff4c72ba686d3d9275f9284d58cd3ef4 Author: Dongjoon Hyun AuthorDate: Sat Feb 24 11:15:05 2024 -0800 [SPARK-47154][SS][TESTS] Fix `kafka-0-10-sql` to use `ResetSystemProperties` if `KafkaTestUtils` is used ### What changes were proposed in this pull request? This PR aims to fix `kafka-0-10-sql` module to use `ResetSystemProperties` if `KafkaTestUtils` is used. The following test suites are fixed. - ConsumerStrategySuite - KafkaDataConsumerSuite - KafkaMissingOffsetsTest - KafkaDontFailOnDataLossSuite - KafkaSourceStressForDontFailOnDataLossSuite - KafkaTest - KafkaDelegationTokenSuite - KafkaMicroBatchSourceSuite - KafkaMicroBatchV1SourceWithAdminSuite - KafkaMicroBatchV2SourceWithAdminSuite - KafkaMicroBatchV1SourceSuite - KafkaMicroBatchV2SourceSuite - KafkaSourceStressSuite - KafkaOffsetReaderSuite - KafkaRelationSuite - KafkaRelationSuiteWithAdminV1 - KafkaRelationSuiteWithAdminV2 - KafkaRelationSuiteV1 - KafkaRelationSuiteV2 - KafkaSinkSuite - KafkaSinkMicroBatchStreamingSuite - KafkaContinuousSinkSuite - KafkaSinkBatchSuiteV1 - KafkaSinkBatchSuiteV2 ### Why are the changes needed? Apache Spark `master` branch has two `KafkaTestUtils` classes. ``` $ find . -name KafkaTestUtils.scala ./connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala ./connector/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala ``` `KafkaTestUtils` of `kafka-0-10-sql` uses `System.setProperty` and affects 8 files. We need to use `ResetSystemProperties` to isolate the test cases. https://github.com/apache/spark/blob/ee312ecb40ea5b5303fc794a3d494b6f27cda923/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala#L290 ``` $ git grep KafkaTestUtils connector/kafka-0-10-sql | awk -F: '{print $1}' | sort | uniq connector/kafka-0-10-sql/src/test/resources/log4j2.properties connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaOffsetReaderSuite.scala connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaRelationSuite.scala connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/consumer/KafkaDataConsumerSuite.scala ``` ### Does this PR introduce _any_ user-facing change? No. This is a test-only PR. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45239 from dongjoon-hyun/SPARK-47154. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala| 3 ++- .../org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 3 ++- .../src/test/scala/org/apache/spark/sql/kafka010/KafkaTest.scala | 3 ++- .../apache/spark/sql/kafka010/consumer/KafkaDataConsumerSuite.scala| 2 ++ 4 files changed, 8 insertions(+), 3 deletions(-) diff --git a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala index 44baab7f2468..cbbbcf9317cd 100644 --- a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala +++ b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala @@ -27,8 +27,9 @@ import org.apache.kafka.common.TopicPartition import org.mockito.Mockito.mock import org.apache.spark.{SparkConf, SparkEnv, SparkFunSuite} +import org.apache.spark.util.ResetSystemProperties -class ConsumerStrategySui
(spark) branch master updated: [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dd3f81c3d610 [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory dd3f81c3d610 is described below commit dd3f81c3d6102fe1427702e97f7f42aa64b0bf5e Author: Dongjoon Hyun AuthorDate: Sat Feb 24 11:05:41 2024 -0800 [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory ### What changes were proposed in this pull request? This PR aims to provide `Apache Hive`'s `CodeHaus Jackson` dependencies via a new optional directory, `hive-jackson`, instead of the standard `jars` directory of Apache Spark binary distribution. Additionally, two internal configurations are added whose default values are `hive-jackson/*`. - `spark.driver.defaultExtraClassPath` - `spark.executor.defaultExtraClassPath` For example, Apache Spark distributions have been providing `spark-*-yarn-shuffle.jar` file under `yarn` directory instead of `jars`. **YARN SHUFFLE EXAMPLE** ``` $ ls -al yarn/*jar -rw-r--r-- 1 dongjoon staff 77352048 Sep 8 19:08 yarn/spark-3.5.0-yarn-shuffle.jar ``` This PR changes `Apache Hive`'s `CodeHaus Jackson` dependencies in a similar way. **BEFORE** ``` $ ls -al jars/*asl* -rw-r--r-- 1 dongjoon staff 232248 Sep 8 19:08 jars/jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Sep 8 19:08 jars/jackson-mapper-asl-1.9.13.jar ``` **AFTER** ``` $ ls -al jars/*asl* zsh: no matches found: jars/*asl* $ ls -al hive-jackson total 1984 drwxr-xr-x 4 dongjoon staff 128 Feb 23 15:37 . drwxr-xr-x 16 dongjoon staff 512 Feb 23 16:34 .. -rw-r--r-- 1 dongjoon staff 232248 Feb 23 15:37 jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Feb 23 15:37 jackson-mapper-asl-1.9.13.jar ``` ### Why are the changes needed? Since Apache Hadoop 3.3.5, only Apache Hive requires old CodeHaus Jackson dependencies. Apache Spark 3.5.0 tried to eliminate them completely but it's reverted due to Hive UDF support. - https://github.com/apache/spark/pull/40893 - https://github.com/apache/spark/pull/42446 SPARK-47119 added a way to exclude Apache Hive Jackson dependencies at the distribution building stage for Apache Spark 4.0.0. - #45201 This PR provides a way to exclude Apache Hive Jackson dependencies at runtime for Apache Spark 4.0.0. - Spark Shell without Apache Hive Jackson dependencies. ``` $ bin/spark-shell --driver-default-class-path "" ``` - Spark SQL Shell without Apache Hive Jackson dependencies. ``` $ bin/spark-sql --driver-default-class-path "" ``` - Spark Thrift Server without Apache Hive Jackson dependencies. ``` $ sbin/start-thriftserver.sh --driver-default-class-path "" ``` In addition, last but not least, this PR eliminates `CodeHaus Jackson` dependencies from the following Apache Spark deamons (using `spark-daemon.sh start`) because they don't require Hive `CodeHaus Jackson` dependencies - Spark Master - Spark Worker - Spark History Server ``` $ grep 'spark-daemon.sh start' * start-history-server.sh:exec "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 "$" start-master.sh:"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \ start-worker.sh: "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS $WORKER_NUM \ ``` ### Does this PR introduce _any_ user-facing change? No. There is no user-facing change by default. - For the distributions with `hive-jackson-provided` profile, the `scope` of Apache Hive Jackson dependencies is `provided` and `hive-jackson` directory is not created at all. - For the distributions with default setting, the `scope` of Apache Hive Jackson dependencies is still `compile`. In addition, they are in the Apache Spark's built-in class path like the following. ![Screenshot 2024-02-23 at 16 48 08](https://github.com/apache/spark/assets/9700541/99ed0f02-2792-4666-ae19-ce4f4b7b8ff9) - The following Spark Deamon don't use `CodeHaus Jackson` dependencies. - Spark Master - Spark Worker - Spark History Server ### How was this patch tested? Pass the CIs and manually build a distribution and check the class paths in the `Environment` Tab. ``` $ dev/make-distribution.sh -Phive,hive-thriftserver ``` ### Was this patch authored or co-authored using generative
(spark) branch master updated (c2dbb6d04bc9 -> ee312ecb40ea)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from c2dbb6d04bc9 [SPARK-47099][SQL][FOLLOWUP] Regenerate `try_arithmetic.sql.out.java21` add ee312ecb40ea [SPARK-47151][PYTHON][PS][BUILD] Upgrade to `pandas` 2.2.1 No new revisions were added by this update. Summary of changes: dev/infra/Dockerfile | 4 ++-- python/pyspark/pandas/supported_api_gen.py | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (772a445e412b -> c2dbb6d04bc9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 772a445e412b [SPARK-47035][SS][CONNECT] Protocol for Client-Side Listener add c2dbb6d04bc9 [SPARK-47099][SQL][FOLLOWUP] Regenerate `try_arithmetic.sql.out.java21` No new revisions were added by this update. Summary of changes: .../src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (d20650bc8cf2 -> 28951ed6681f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d20650bc8cf2 [SPARK-46975][PS] Support dedicated fallback methods add 28951ed6681f [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 No new revisions were added by this update. Summary of changes: connector/connect/server/pom.xml | 10 ++ .../sql/connect/ui/SparkConnectServerPage.scala| 3 +- .../connect/ui/SparkConnectServerSessionPage.scala | 4 +- .../connect/ui/SparkConnectServerPageSuite.scala | 2 +- core/pom.xml | 5 +- .../main/scala/org/apache/spark/deploy/Utils.scala | 3 +- .../spark/deploy/history/ApplicationCache.scala| 4 +- .../apache/spark/deploy/history/HistoryPage.scala | 4 +- .../spark/deploy/history/HistoryServer.scala | 2 +- .../org/apache/spark/deploy/history/LogPage.scala | 4 +- .../spark/deploy/master/ui/ApplicationPage.scala | 4 +- .../apache/spark/deploy/master/ui/LogPage.scala| 4 +- .../apache/spark/deploy/master/ui/MasterPage.scala | 3 +- .../spark/deploy/master/ui/MasterWebUI.scala | 3 +- .../spark/deploy/rest/RestSubmissionClient.scala | 2 +- .../spark/deploy/rest/RestSubmissionServer.scala | 3 +- .../spark/deploy/rest/StandaloneRestServer.scala | 3 +- .../apache/spark/deploy/worker/ui/LogPage.scala| 3 +- .../apache/spark/deploy/worker/ui/WorkerPage.scala | 3 +- .../spark/deploy/worker/ui/WorkerWebUI.scala | 3 +- .../apache/spark/metrics/sink/MetricsServlet.scala | 2 +- .../spark/metrics/sink/PrometheusServlet.scala | 2 +- .../spark/status/api/v1/ApiRootResource.scala | 8 +- .../status/api/v1/ApplicationListResource.scala| 5 +- .../spark/status/api/v1/JacksonMessageWriter.scala | 6 +- .../status/api/v1/OneApplicationResource.scala | 5 +- .../spark/status/api/v1/PrometheusResource.scala | 5 +- .../spark/status/api/v1/SimpleDateParam.scala | 7 +- .../spark/status/api/v1/StagesResource.scala | 5 +- .../scala/org/apache/spark/ui/DriverLogPage.scala | 4 +- .../scala/org/apache/spark/ui/GraphUIData.scala| 3 +- .../org/apache/spark/ui/HttpSecurityFilter.scala | 4 +- .../scala/org/apache/spark/ui/JettyUtils.scala | 4 +- .../scala/org/apache/spark/ui/PagedTable.scala | 2 +- .../main/scala/org/apache/spark/ui/SparkUI.scala | 2 +- .../main/scala/org/apache/spark/ui/UIUtils.scala | 4 +- .../src/main/scala/org/apache/spark/ui/WebUI.scala | 4 +- .../org/apache/spark/ui/env/EnvironmentPage.scala | 4 +- .../spark/ui/exec/ExecutorHeapHistogramPage.scala | 4 +- .../spark/ui/exec/ExecutorThreadDumpPage.scala | 4 +- .../org/apache/spark/ui/exec/ExecutorsTab.scala| 4 +- .../org/apache/spark/ui/jobs/AllJobsPage.scala | 2 +- .../org/apache/spark/ui/jobs/AllStagesPage.scala | 4 +- .../scala/org/apache/spark/ui/jobs/JobPage.scala | 2 +- .../scala/org/apache/spark/ui/jobs/JobsTab.scala | 2 +- .../scala/org/apache/spark/ui/jobs/PoolPage.scala | 4 +- .../scala/org/apache/spark/ui/jobs/PoolTable.scala | 3 +- .../scala/org/apache/spark/ui/jobs/StagePage.scala | 3 +- .../org/apache/spark/ui/jobs/StageTable.scala | 3 +- .../scala/org/apache/spark/ui/jobs/StagesTab.scala | 2 +- .../apache/spark/ui/jobs/TaskThreadDumpPage.scala | 4 +- .../org/apache/spark/ui/storage/RDDPage.scala | 3 +- .../org/apache/spark/ui/storage/StoragePage.scala | 4 +- .../main/scala/org/apache/spark/util/Utils.scala | 2 +- .../deploy/history/ApplicationCacheSuite.scala | 2 +- .../deploy/history/HistoryServerPageSuite.scala| 2 +- .../spark/deploy/history/HistoryServerSuite.scala | 4 +- .../history/RealBrowserUIHistoryServerSuite.scala | 3 +- .../deploy/master/ui/ApplicationPageSuite.scala| 2 +- .../master/ui/ReadOnlyMasterWebUISuite.scala | 3 +- .../deploy/rest/StandaloneRestSubmitSuite.scala| 2 +- .../spark/status/api/v1/SimpleDateParamSuite.scala | 3 +- .../org/apache/spark/ui/DriverLogPageSuite.scala | 2 +- .../apache/spark/ui/HttpSecurityFilterSuite.scala | 4 +- .../scala/org/apache/spark/ui/StagePageSuite.scala | 3 +- .../org/apache/spark/ui/UISeleniumSuite.scala | 2 +- .../test/scala/org/apache/spark/ui/UISuite.scala | 4 +- .../apache/spark/ui/env/EnvironmentPageSuite.scala | 3 +- .../apache/spark/ui/storage/StoragePageSuite.scala | 3 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 36 +++--- docs/core-migration-guide.md | 2 + mllib/pom.xml | 4 + pom.xml| 26 ++-- project/MimaExcludes.scala | 5 +- project/SparkBuild.scala | 4 +- .../deploy
(spark) branch master updated: [SPARK-47129][CONNECT][SQL] Make `ResolveRelations` cache connect plan properly
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 06c741a0061b [SPARK-47129][CONNECT][SQL] Make `ResolveRelations` cache connect plan properly 06c741a0061b is described below commit 06c741a0061bcf2c6e2c08212cab9f4e774cb70a Author: Ruifeng Zheng AuthorDate: Fri Feb 23 09:26:13 2024 -0800 [SPARK-47129][CONNECT][SQL] Make `ResolveRelations` cache connect plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes #45214 from zhengruifeng/connect_fix_read_join. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/test_readwriter.py| 23 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 27 -- 2 files changed, 42 insertions(+), 8 deletions(-) diff --git a/python/pyspark/sql/tests/test_readwriter.py b/python/pyspark/sql/tests/test_readwriter.py index 70a320fc53b6..85057f37a181 100644 --- a/python/pyspark/sql/tests/test_readwriter.py +++ b/python/pyspark/sql/tests/test_readwriter.py @@ -20,7 +20,7 @@ import shutil import tempfile from pyspark.errors import AnalysisException -from pyspark.sql.functions import col +from pyspark.sql.functions import col, lit from pyspark.sql.readwriter import DataFrameWriterV2 from pyspark.sql.types import StructType, StructField, StringType from pyspark.testing.sqlutils import ReusedSQLTestCase @@ -181,6 +181,27 @@ class ReadwriterTestsMixin: df.write.mode("overwrite").insertInto("test_table", False) self.assertEqual(6, self.spark.sql(&quo
(spark) branch master updated: [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3baa60afe25c [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 3baa60afe25c is described below commit 3baa60afe25c821ced1e956502f7c77b719f73dd Author: Dongjoon Hyun AuthorDate: Fri Feb 23 08:36:32 2024 -0800 [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 ### What changes were proposed in this pull request? This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. ### Why are the changes needed? This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - #42613 - #42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs including `HiveExternalCatalogVersionsSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/util/MavenUtils.scala | 17 ++--- .../test/scala/org/apache/spark/util/IvyTestUtils.scala | 3 ++- .../org/apache/spark/internal/config/package.scala | 4 ++-- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- dev/run-tests.py| 2 ++ docs/core-migration-guide.md| 2 ++ pom.xml | 6 +- 7 files changed, 24 insertions(+), 12 deletions(-) diff --git a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala index 65530b7fa473..08291859a32c 100644 --- a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala +++ b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala @@ -324,6 +324,14 @@ private[spark] object MavenUtils extends Logging { val ivySettings: IvySettings = new IvySettings try { ivySettings.load(file) + if (ivySettings.getDefaultIvyUserDir == null && ivySettings.getDefaultCache == null) { +// To protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility. +// `processIvyPathArg` can overwrite these later. +val alternateIvyDir = System.getProperty("ivy.home", + System.getProperty("user.home") + File.separator + ".ivy2.5.2") +ivySettings.setDefaultIvyUserDir(new File(alternateIvyDir)) +ivySettings.setDefaultCache(new File(alternateIvyDir, "cache")) + } } catch { case e @ (_: IOException | _: ParseException) => throw new SparkException(s"Failed when loading Ivy settings from $settingsFile", e) @@ -335,10 +343,13 @@ private[spark] object MavenUtils extends Logging { /* Set ivy settings for location of cache, if option is supplied */ private def processIvyPathArg(ivySettings: IvySettings, ivyPath: Option[String]): Unit = { -ivyPath.filterNot(_.trim.isEmpty).foreach { alternateIvyDir => - ivySettings.setDefaultIvyUserDir(new File(alternateIvyDir)) - ivySettings.setDefaultCache(new File(alternateIvyDir, "cache")) +val alternateIvyDir = ivyPath.filterNot(_.trim.isEmpty).getOrElse { + // To protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility. + System.getProperty("ivy.home", +System.getProperty("user.home") + File.separator + ".ivy2.5.2") } +ivySettings.setDefaultIvyUserDir(new File(alternateIvyDir)) +ivySettings.setDefaultCache(new File(alternateIvyDir, "cache")) } /* Add any optional additional remote repositories */ diff --git a/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala b/common/
(spark) branch master updated (d466c0beabcf -> 09739294ba1d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d466c0beabcf [SPARK-47142][K8S][TESTS] Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite` add 09739294ba1d [SPARK-47143][CONNECT][TESTS] Improve `ArtifactSuite` to use unique `MavenCoordinate`s No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/connect/client/ArtifactSuite.scala | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (a053b40ac0e9 -> d466c0beabcf)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a053b40ac0e9 [SPARK-47099][SQL][FOLLOW-UP] Uses ordinalNumber in UNEXPECTED_INPUT_TYPE add d466c0beabcf [SPARK-47142][K8S][TESTS] Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite` No new revisions were added by this update. Summary of changes: .../org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47137][PYTHON][CONNECT] Add getAll to spark.conf for feature parity with Scala
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 511839b6eac9 [SPARK-47137][PYTHON][CONNECT] Add getAll to spark.conf for feature parity with Scala 511839b6eac9 is described below commit 511839b6eac974351410a1713f5a90329e49abe9 Author: Takuya UESHIN AuthorDate: Thu Feb 22 20:22:43 2024 -0800 [SPARK-47137][PYTHON][CONNECT] Add getAll to spark.conf for feature parity with Scala ### What changes were proposed in this pull request? Adds `getAll` to `spark.conf` for feature parity with Scala. ```py >>> spark.conf.getAll {'spark.sql.warehouse.dir': ...} ``` ### Why are the changes needed? Scala API provides `spark.conf.getAll`; whereas Python doesn't. ```scala scala> spark.conf.getAll val res0: Map[String,String] = HashMap(spark.sql.warehouse.dir -> ... ``` ### Does this PR introduce _any_ user-facing change? Yes, `spark.conf.getAll` will be available in PySpark. ### How was this patch tested? Added the related tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45222 from ueshin/issues/SPARK-47137/getAll. Authored-by: Takuya UESHIN Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/conf.py | 16 +- python/pyspark/sql/connect/conf.py | 15 +- python/pyspark/sql/tests/test_conf.py | 63 ++ .../scala/org/apache/spark/sql/RuntimeConfig.scala | 6 +++ 4 files changed, 75 insertions(+), 25 deletions(-) diff --git a/python/pyspark/sql/conf.py b/python/pyspark/sql/conf.py index e77039565dd1..dd43991b0706 100644 --- a/python/pyspark/sql/conf.py +++ b/python/pyspark/sql/conf.py @@ -16,7 +16,7 @@ # import sys -from typing import Any, Optional, Union +from typing import Any, Dict, Optional, Union from py4j.java_gateway import JavaObject @@ -93,6 +93,20 @@ class RuntimeConfig: self._check_type(default, "default") return self._jconf.get(key, default) +@property +def getAll(self) -> Dict[str, str]: +""" +Returns all properties set in this conf. + +.. versionadded:: 4.0.0 + +Returns +--- +dict +A dictionary containing all properties set in this conf. +""" +return dict(self._jconf.getAllAsJava()) + def unset(self, key: str) -> None: """ Resets the configuration property for the given key. diff --git a/python/pyspark/sql/connect/conf.py b/python/pyspark/sql/connect/conf.py index 3548a31fef03..57a669aca889 100644 --- a/python/pyspark/sql/connect/conf.py +++ b/python/pyspark/sql/connect/conf.py @@ -19,7 +19,7 @@ from pyspark.sql.connect.utils import check_dependencies check_dependencies(__name__) -from typing import Any, Optional, Union, cast +from typing import Any, Dict, Optional, Union, cast import warnings from pyspark import _NoValue @@ -68,6 +68,19 @@ class RuntimeConf: get.__doc__ = PySparkRuntimeConfig.get.__doc__ +@property +def getAll(self) -> Dict[str, str]: +op_get_all = proto.ConfigRequest.GetAll() +operation = proto.ConfigRequest.Operation(get_all=op_get_all) +result = self._client.config(operation) +confs: Dict[str, str] = dict() +for key, value in result.pairs: +assert value is not None +confs[key] = value +return confs + +getAll.__doc__ = PySparkRuntimeConfig.getAll.__doc__ + def unset(self, key: str) -> None: op_unset = proto.ConfigRequest.Unset(keys=[key]) operation = proto.ConfigRequest.Operation(unset=op_unset) diff --git a/python/pyspark/sql/tests/test_conf.py b/python/pyspark/sql/tests/test_conf.py index 9b939205b1d1..68b147f09746 100644 --- a/python/pyspark/sql/tests/test_conf.py +++ b/python/pyspark/sql/tests/test_conf.py @@ -50,32 +50,49 @@ class ConfTestsMixin: def test_conf_with_python_objects(self): spark = self.spark -for value, expected in [(True, "true"), (False, "false")]: -spark.conf.set("foo", value) -self.assertEqual(spark.conf.get("foo"), expected) - -spark.conf.set("foo", 1) -self.assertEqual(spark.conf.get("foo"), "1") - -with self.assertRaises(IllegalArgumentException): -spark.conf.set("foo", None) - -with self.assertRaises(Exception): -spark.conf.set("foo", Decimal(1)) +try: +for value, expected in
(spark) branch master updated: [SPARK-43259][SQL][FOLLOWUP] Regenerate `sql-error-conditions.md` to recover `SparkThrowableSuite`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6ae0abb64289 [SPARK-43259][SQL][FOLLOWUP] Regenerate `sql-error-conditions.md` to recover `SparkThrowableSuite` 6ae0abb64289 is described below commit 6ae0abb64289c2124b2a2dd4043d010a06a14465 Author: Dongjoon Hyun AuthorDate: Thu Feb 22 17:26:32 2024 -0800 [SPARK-43259][SQL][FOLLOWUP] Regenerate `sql-error-conditions.md` to recover `SparkThrowableSuite` ### What changes were proposed in this pull request? This is a follow-up of #45095 ### Why are the changes needed? To recover the broken `master` branch. - https://github.com/apache/spark/actions/runs/8008631301/job/21875499011 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. I manually verified like the following. ``` [info] SparkThrowableSuite: [info] - No duplicate error classes (23 milliseconds) [info] - Error classes are correctly formatted (37 milliseconds) [info] - SQLSTATE is mandatory (1 millisecond) [info] - Error category and error state / SQLSTATE invariants (21 milliseconds) [info] - Message invariants (6 milliseconds) [info] - Message format invariants (9 milliseconds) [info] - Error classes match with document (54 milliseconds) [info] - Round trip (23 milliseconds) [info] - Error class names should contain only capital letters, numbers and underscores (5 milliseconds) [info] - Check if error class is missing (14 milliseconds) [info] - Check if message parameters match message format (2 milliseconds) [info] - Error message is formatted (0 milliseconds) [info] - Error message does not do substitution on values (0 milliseconds) [info] - Try catching legacy SparkError (1 millisecond) [info] - Try catching SparkError with error class (1 millisecond) [info] - Try catching internal SparkError (1 millisecond) [info] - Get message in the specified format (3 milliseconds) [info] - overwrite error classes (47 milliseconds) [info] - prohibit dots in error class names (15 milliseconds) [info] Run completed in 1 second, 90 milliseconds. [info] Total number of tests run: 19 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 7 s, completed Feb 22, 2024, 5:22:24 PM ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45226 from dongjoon-hyun/SPARK-43259. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- docs/sql-error-conditions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 0745de995799..bb982a77fca0 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -1148,7 +1148,7 @@ Please increase executor memory using the --executor-memory option or "` [SQLSTATE: 42001](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation) -Found an invalid expression encoder. Expects an instance of `ExpressionEncoder` but got ``. For more information consult '``/api/java/index.html?org/apache/spark/sql/Encoder.html'. +Found an invalid expression encoder. Expects an instance of ExpressionEncoder but got ``. For more information consult '``/api/java/index.html?org/apache/spark/sql/Encoder.html'. ### INVALID_EXTRACT_BASE_FIELD_TYPE - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47136][CORE][TESTS] Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9bc273ee0dad [SPARK-47136][CORE][TESTS] Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly 9bc273ee0dad is described below commit 9bc273ee0daddef3a0d453ba6311e996bc56830d Author: Dongjoon Hyun AuthorDate: Thu Feb 22 15:26:01 2024 -0800 [SPARK-47136][CORE][TESTS] Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly ### What changes were proposed in this pull request? This PR aims the following. 1. Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly by using `ivyPath` parameter of `MavenUtils.loadIvySettings` method consistently. 2. Make all test cases isolated by adding `beforeEach` and `afterEach` instead of a single `beforeAll` ### Why are the changes needed? 1. `MavenUtils` assumes to set the following together inside if it receives `ivyPath`. https://github.com/apache/spark/blob/9debaeaa5a079a73605cddb90b1a77274c5284d3/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala#L337-L342 3. `MavenUtilsSuite` uses `tempIvyPath` for all `MavenUtils.resolveMavenCoordinates` except one test case. https://github.com/apache/spark/blob/9debaeaa5a079a73605cddb90b1a77274c5284d3/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala#L175-L175 4. The following is the missed case and this PR aims to fix. https://github.com/apache/spark/blob/9debaeaa5a079a73605cddb90b1a77274c5284d3/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala#L253 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ``` $ build/sbt "common-utils/testOnly *MavenUtilsSuite" ... [info] MavenUtilsSuite: [info] - incorrect maven coordinate throws error (9 milliseconds) [info] - create repo resolvers (19 milliseconds) [info] - create additional resolvers (7 milliseconds) :: loading settings :: url = jar:file:/Users/dongjoon/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/apache/ivy/ivy/2.5.1/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml [info] - add dependencies works correctly (29 milliseconds) [info] - excludes works correctly (2 milliseconds) [info] - ivy path works correctly (661 milliseconds) [info] - search for artifact at local repositories (405 milliseconds) [info] - dependency not found throws RuntimeException (198 milliseconds) :: loading settings :: url = jar:file:/Users/dongjoon/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/apache/ivy/ivy/2.5.1/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml [info] - neglects Spark and Spark's dependencies (388 milliseconds) [info] - exclude dependencies end to end (385 milliseconds) :: loading settings :: file = /Users/dongjoon/APACHE/spark-merge/target/tmp/ivy-9aa3863e-9dba-4002-996b-5e86b2f1281f/ivysettings.xml [info] - load ivy settings file (103 milliseconds) [info] - SPARK-10878: test resolution files cleaned after resolving artifact (70 milliseconds) Spark was unable to load org/apache/spark/log4j2-defaults.properties [info] - SPARK-34624: should ignore non-jar dependencies (247 milliseconds) [info] Run completed in 3 seconds, 16 milliseconds. [info] Total number of tests run: 13 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 13, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 3 s, completed Feb 22, 2024, 2:21:18 PM ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45220 from dongjoon-hyun/SPARK-47136. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/util/MavenUtilsSuite.scala| 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala b/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala index 642eca3cf933..d30422ca8dd5 100644 --- a/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala +++ b/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala @@ -28,14 +28,14 @@ import scala.jdk.CollectionConverters._ import org.apache.ivy.core.module.descriptor.MDArtifact import org.apache.ivy.core.settings.IvySettings import org.apache.ivy.plugins.resolver.{AbstractResolver, ChainResolver, FileSystemResolver, IBiblioResolver} -import org.scalatest.BeforeAndAfterAll +import org.scalatest.BeforeAndAfterEach import org.scalatest.funsu