(spark) branch master updated: [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 944a00db6f83 [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv` 944a00db6f83 is described below commit 944a00db6f83b076624d5c00cd60dba5667b4e0b Author: Ruifeng Zheng AuthorDate: Thu Feb 29 21:52:34 2024 +0900 [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv` ### What changes were proposed in this pull request? Split `test_split_apply_basic`/`test_split_apply_adv` and their parity tests ### Why are the changes needed? it is still slow, split it for testing parallelism ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #45332 from zhengruifeng/ps_test_split_apply_basic. Authored-by: Ruifeng Zheng Signed-off-by: Hyukjin Kwon --- dev/sparktestsupport/modules.py | 16 ...t_apply_basic.py => test_parity_split_apply_count.py} | 8 ...lit_apply_adv.py => test_parity_split_apply_first.py} | 8 ...plit_apply_adv.py => test_parity_split_apply_last.py} | 8 ...plit_apply_adv.py => test_parity_split_apply_skew.py} | 8 ...split_apply_adv.py => test_parity_split_apply_std.py} | 8 ...split_apply_adv.py => test_parity_split_apply_var.py} | 8 ...st_split_apply_basic.py => test_split_apply_count.py} | 10 +- ...st_split_apply_basic.py => test_split_apply_first.py} | 10 +- ...est_split_apply_basic.py => test_split_apply_last.py} | 8 ...{test_split_apply_adv.py => test_split_apply_skew.py} | 10 +- .../{test_split_apply_adv.py => test_split_apply_std.py} | 10 +- .../{test_split_apply_adv.py => test_split_apply_var.py} | 10 +- 13 files changed, 65 insertions(+), 57 deletions(-) diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py index 629e74650bea..b2c64a8242de 100644 --- a/dev/sparktestsupport/modules.py +++ b/dev/sparktestsupport/modules.py @@ -904,9 +904,13 @@ pyspark_pandas_slow = Module( "pyspark.pandas.tests.groupby.test_rank", "pyspark.pandas.tests.groupby.test_size", "pyspark.pandas.tests.groupby.test_split_apply", -"pyspark.pandas.tests.groupby.test_split_apply_adv", -"pyspark.pandas.tests.groupby.test_split_apply_basic", +"pyspark.pandas.tests.groupby.test_split_apply_count", +"pyspark.pandas.tests.groupby.test_split_apply_first", +"pyspark.pandas.tests.groupby.test_split_apply_last", "pyspark.pandas.tests.groupby.test_split_apply_min_max", +"pyspark.pandas.tests.groupby.test_split_apply_skew", +"pyspark.pandas.tests.groupby.test_split_apply_std", +"pyspark.pandas.tests.groupby.test_split_apply_var", "pyspark.pandas.tests.groupby.test_stat", "pyspark.pandas.tests.groupby.test_stat_adv", "pyspark.pandas.tests.groupby.test_stat_ddof", @@ -1180,9 +1184,13 @@ pyspark_pandas_connect_part1 = Module( "pyspark.pandas.tests.connect.groupby.test_parity_cumulative", "pyspark.pandas.tests.connect.groupby.test_parity_missing_data", "pyspark.pandas.tests.connect.groupby.test_parity_split_apply", -"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_adv", -"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_basic", +"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_count", +"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_first", +"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_last", "pyspark.pandas.tests.connect.groupby.test_parity_split_apply_min_max", +"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_skew", +"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_std", +"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_var", "pyspark.pandas.tests.connect.series.test_parity_datetime", "pyspark.pandas.tests.connect.series.test_parity_string_ops_adv", "pyspark.pandas.tests.connect.series.test_parity_string_ops_basic", diff --git a/python/pyspark/pandas/tests/connect/groupby/test_parity_split_apply_basic.py b/python/pyspark/pandas/tests/connect/groupby/test_parity_split_apply_count.py similarity index 87% rename from python/pyspark/pandas/tests/connect/groupby/test_parity_split_apply_basic.py rename to python/pyspark/pandas/tests/connect/groupby/test_parity_split_apply_count.py index 2964213ab484..3e7931d1b5a0 100644 --- a/python/pyspark/pandas/tests/connect/groupby/test_parity_sp
(spark) branch master updated (944a00db6f83 -> a7825f6e8907)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 944a00db6f83 [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv` add a7825f6e8907 [SPARK-47227][DOCS] Improve documentation for Spark Connect No new revisions were added by this update. Summary of changes: docs/spark-connect-overview.md | 20 1 file changed, 20 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (a7825f6e8907 -> 919c19c008b8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a7825f6e8907 [SPARK-47227][DOCS] Improve documentation for Spark Connect add 919c19c008b8 [SPARK-47231][CORE][TESTS] FakeTask should reference its TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/scheduler/FakeTask.scala| 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 28fd3de0fea0 [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 28fd3de0fea0 is described below commit 28fd3de0fea0e952aa1494838d00185613389277 Author: yangjie01 AuthorDate: Thu Feb 29 07:56:29 2024 -0800 [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 ### What changes were proposed in this pull request? Adds bouncy-castle jdk18 artifacts to test builds in spark-sql. Based on #38974 * only applies the test import changes * dependencies are those of #44359 ### Why are the changes needed? Forthcoming Hadoop 3.4.0 release doesn't export the bouncy-castle JARs; maven builds fail. ### Does this PR introduce _any_ user-facing change? No: test time dependency declarations only. ### How was this patch tested? This was done through the release build/test project https://github.com/apache/hadoop-release-support 1. Latest RC2 artifacts pulled from apache maven staging 2. Spark maven build triggered with the hadoop-version passed down. 3. The 3.3.6 release template worked with spark master (as it should!) 4. With this change the 3.4.0 RC build worked with this change Note: have not *yet* done a maven test run through this yet ### Was this patch authored or co-authored using generative AI tooling? No Closes #45317 from steveloughran/SPARK-41392-HADOOP-3.4.0. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- sql/core/pom.xml | 12 1 file changed, 12 insertions(+) diff --git a/sql/core/pom.xml b/sql/core/pom.xml index 8b1b51352a20..0ad9e0f690c7 100644 --- a/sql/core/pom.xml +++ b/sql/core/pom.xml @@ -223,6 +223,18 @@ htmlunit3-driver test + + + org.bouncycastle + bcprov-jdk18on + test + + + org.bouncycastle + bcpkix-jdk18on + test + target/scala-${scala.binary.version}/classes - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (28fd3de0fea0 -> 813934c69df6)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 28fd3de0fea0 [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 add 813934c69df6 [SPARK-47015][SQL] Disable partitioning on collated columns No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-classes.json| 13 +- docs/sql-error-conditions.md | 6 + .../spark/sql/errors/QueryCompilationErrors.scala | 6 ++--- .../execution/datasources/PartitioningUtils.scala | 18 ++--- .../spark/sql/execution/datasources/rules.scala| 14 +- .../org/apache/spark/sql/CollationSuite.scala | 30 ++ .../org/apache/spark/sql/SQLInsertTestSuite.scala | 2 +- .../spark/sql/connector/AlterTableTests.scala | 4 +-- 8 files changed, 71 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (813934c69df6 -> 70007c59177a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 813934c69df6 [SPARK-47015][SQL] Disable partitioning on collated columns add 70007c59177a [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for docker ITs No new revisions were added by this update. Summary of changes: .../sql/jdbc/DockerJDBCIntegrationSuite.scala | 39 -- 1 file changed, 14 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (70007c59177a -> 9ce43c85a5d2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 70007c59177a [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for docker ITs add 9ce43c85a5d2 [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the never changed `var` to `val` No new revisions were added by this update. Summary of changes: .../sql/connect/planner/SparkConnectServiceSuite.scala | 2 +- .../scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 2 +- .../spark/executor/CoarseGrainedExecutorBackend.scala| 2 +- core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala | 2 +- .../org/apache/spark/resource/ResourceProfileSuite.scala | 2 +- .../apache/spark/scheduler/TaskSchedulerImplSuite.scala | 2 +- .../apache/spark/shuffle/ShuffleBlockPusherSuite.scala | 2 +- .../main/scala/org/apache/spark/deploy/yarn/Client.scala | 2 +- .../apache/spark/sql/catalyst/parser/AstBuilder.scala| 2 +- .../spark/sql/catalyst/analysis/AnalysisErrorSuite.scala | 8 .../catalyst/expressions/StringExpressionsSuite.scala| 16 .../spark/sql/execution/streaming/state/RocksDB.scala| 2 +- .../scala/org/apache/spark/sql/ConfigBehaviorSuite.scala | 2 +- .../datasources/parquet/ParquetVectorizedSuite.scala | 2 +- .../sql/hive/thriftserver/HiveThriftServer2Suites.scala | 2 +- 15 files changed, 25 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (9ce43c85a5d2 -> 7447172ffd5c)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9ce43c85a5d2 [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the never changed `var` to `val` add 7447172ffd5c [SPARK-47200][SS] Error class for Foreach batch sink user function error No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-classes.json| 6 + docs/sql-error-conditions.md | 6 + .../streaming/test_streaming_foreach_batch.py | 5 - .../streaming/sources/ForeachBatchSink.scala | 18 ++- .../sql/errors/QueryExecutionErrorsSuite.scala | 2 +- .../streaming/sources/ForeachBatchSinkSuite.scala | 26 ++ 6 files changed, 60 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (7447172ffd5c -> 32102bcbec87)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7447172ffd5c [SPARK-47200][SS] Error class for Foreach batch sink user function error add 32102bcbec87 [SPARK-47211][CONNECT][PYTHON] Fix ignored PySpark Connect string collation No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/types.py| 4 ++-- python/pyspark/sql/tests/test_types.py | 12 2 files changed, 14 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cc0ea60d6eee [SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer cc0ea60d6eee is described below commit cc0ea60d6b21fb5012512f0470cb669c911d Author: Yousof Hosny AuthorDate: Fri Mar 1 09:53:42 2024 +0900 [SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer ### What changes were proposed in this pull request? Added some logic to handle comments in XMLTokenizer. Comments are ignored anytime they appear. Anything enclosed in a leftmost `` will be ignored. Does not handle nested comments: ` -->` will leave the last `-->` dangling. Does not handle comments *within* row tags - ` ...` will fail - ` ...` will fail These two cases are not considered valid XML according to the official spec (no nested comments, no comments within tags) ### Why are the changes needed? This is a bug which results in incorrect data being parsed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test covering different cases of where comments may be placed relative to row tags. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45325 from yhosny/xml-tokenizer. Lead-authored-by: Yousof Hosny Co-authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../spark/sql/catalyst/xml/StaxXmlParser.scala | 56 ++ .../test-data/xml-resources/commented-row.xml | 25 ++ .../sql/execution/datasources/xml/XmlSuite.scala | 10 3 files changed, 91 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala index 544a761219cd..a4d15695145d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala @@ -624,6 +624,8 @@ class XmlTokenizer( private var buffer = new StringBuilder() private val startTag = s"<${options.rowTag}>" private val endTag = s"" + private val commentStart = s"" /** * Finds the start of the next record. @@ -671,9 +673,35 @@ class XmlTokenizer( nextString } + private def readUntilCommentClose(): Boolean = { +var i = 0 +while (true) { + val cOrEOF = reader.read() + if (cOrEOF == -1) { +// End of file. +return false + } + val c = cOrEOF.toChar + if (c == commentEnd(i)) { +if (i >= commentEnd.length - 1) { + // Found comment close. + return true +} +i += 1 + } else { +i = 0 + } +} + +// Unreachable (a comment tag must close) +false + } + private def readUntilStartElement(): Boolean = { currentStartTag = startTag var i = 0 +var commentIdx = 0 + while (true) { val cOrEOF = reader.read() if (cOrEOF == -1) { // || (i == 0 && getFilePosition() > end)) { @@ -681,6 +709,19 @@ class XmlTokenizer( return false } val c = cOrEOF.toChar + + if (c == commentStart(commentIdx)) { +if (commentIdx >= commentStart.length - 1) { + // If a comment beigns we must ignore all character until its end + commentIdx = 0 + readUntilCommentClose() +} else { + commentIdx += 1 +} + } else { +commentIdx = 0 + } + if (c == startTag(i)) { if (i >= startTag.length - 1) { // Found start tag. @@ -708,6 +749,8 @@ class XmlTokenizer( // Index into the start or end tag that has matched so far var si = 0 var ei = 0 +// Index into the start of a comment tag that matched so far +var commentIdx = 0 // How many other start tags enclose the one that's started already? var depth = 0 // Previously read character @@ -730,6 +773,19 @@ class XmlTokenizer( val c = cOrEOF.toChar buffer.append(c) + if (c == commentStart(commentIdx)) { +if (commentIdx >= commentStart.length - 1) { + // If a comment beigns we must ignore everything until its end + buffer.setLength(buffer.length - commentStart.length) + commentIdx = 0 + readUntilCommentClose() +} else { + commentIdx += 1 +} + } else { +commentIdx = 0 + } + if (c == '>' && prevC != '/') { canSelfClose = false } diff --git a/sql/core/src/test/resources/test-data/xml-resources/commented-row.xml b/sql/core/src/test/resources/test-data/xml
(spark) branch master updated (cc0ea60d6eee -> 1cd7bab5c5c2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cc0ea60d6eee [SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer add 1cd7bab5c5c2 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input No new revisions were added by this update. Summary of changes: common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 58a4a49389a5 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input 58a4a49389a5 is described below commit 58a4a49389a5f9979f7dabc5320116a212eb4bdb Author: Dongjoon Hyun AuthorDate: Thu Feb 29 19:08:15 2024 -0800 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input ### What changes were proposed in this pull request? This PR aims to fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input. ### Why are the changes needed? `deleteRecursivelyUsingJavaIO` is a fallback of `deleteRecursivelyUsingUnixNative`. We should have identical capability. Currently, it fails. ``` [info] java.nio.file.NoSuchFileException: /Users/dongjoon/APACHE/spark-merge/target/tmp/spark-e264d853-42c0-44a2-9a30-22049522b04f [info] at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) [info] at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) [info] at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) [info] at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) [info] at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) [info] at java.base/java.nio.file.Files.readAttributes(Files.java:1851) [info] at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:126) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is difficult to test this `private static` Java method. I tested this with #45344 . ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45346 from dongjoon-hyun/SPARK-47236. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 1cd7bab5c5c2bd8d595b131c88e6576486dbf123) Signed-off-by: Dongjoon Hyun --- .../src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 + 1 file changed, 1 insertion(+) diff --git a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java index 7e410e9eab22..59744ec5748a 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java +++ b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java @@ -124,6 +124,7 @@ public class JavaUtils { private static void deleteRecursivelyUsingJavaIO( File file, FilenameFilter filter) throws IOException { +if (!file.exists()) return; BasicFileAttributes fileAttributes = Files.readAttributes(file.toPath(), BasicFileAttributes.class); if (fileAttributes.isDirectory() && !isSymlink(file)) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 9770016b180b [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input 9770016b180b is described below commit 9770016b180b0477060777d3739a2bfaabc6fcb3 Author: Dongjoon Hyun AuthorDate: Thu Feb 29 19:08:15 2024 -0800 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input ### What changes were proposed in this pull request? This PR aims to fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input. ### Why are the changes needed? `deleteRecursivelyUsingJavaIO` is a fallback of `deleteRecursivelyUsingUnixNative`. We should have identical capability. Currently, it fails. ``` [info] java.nio.file.NoSuchFileException: /Users/dongjoon/APACHE/spark-merge/target/tmp/spark-e264d853-42c0-44a2-9a30-22049522b04f [info] at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) [info] at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) [info] at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) [info] at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) [info] at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) [info] at java.base/java.nio.file.Files.readAttributes(Files.java:1851) [info] at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:126) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is difficult to test this `private static` Java method. I tested this with #45344 . ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45346 from dongjoon-hyun/SPARK-47236. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 1cd7bab5c5c2bd8d595b131c88e6576486dbf123) Signed-off-by: Dongjoon Hyun --- common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 + 1 file changed, 1 insertion(+) diff --git a/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java b/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java index bbe764b8366c..d6603dcbee1a 100644 --- a/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java +++ b/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java @@ -120,6 +120,7 @@ public class JavaUtils { private static void deleteRecursivelyUsingJavaIO( File file, FilenameFilter filter) throws IOException { +if (!file.exists()) return; BasicFileAttributes fileAttributes = Files.readAttributes(file.toPath(), BasicFileAttributes.class); if (fileAttributes.isDirectory() && !isSymlink(file)) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: MINOR: Update the link of latest to 3.5.1
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 53982b743a MINOR: Update the link of latest to 3.5.1 53982b743a is described below commit 53982b743a2834472f2f028e821a543355b359da Author: Jungtaek Lim AuthorDate: Fri Mar 1 13:23:42 2024 +0900 MINOR: Update the link of latest to 3.5.1 --- site/docs/latest | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/docs/latest b/site/docs/latest index e5b820341f..3c8ff8c36b 12 --- a/site/docs/latest +++ b/site/docs/latest @@ -1 +1 @@ -3.5.0 \ No newline at end of file +3.5.1 \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (1cd7bab5c5c2 -> 22f9a5a25304)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1cd7bab5c5c2 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input add 22f9a5a25304 [SPARK-47235][CORE][TESTS] Disable `deleteRecursivelyUsingUnixNative` in Apple Silicon test env No new revisions were added by this update. Summary of changes: .../utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47202][PYTHON][TESTS][FOLLOW-UP] Run the test only with Python 3.9+
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 899153039023 [SPARK-47202][PYTHON][TESTS][FOLLOW-UP] Run the test only with Python 3.9+ 899153039023 is described below commit 899153039023f2a12f43813028dcb54c9e880eda Author: Hyukjin Kwon AuthorDate: Thu Feb 29 10:56:26 2024 +0900 [SPARK-47202][PYTHON][TESTS][FOLLOW-UP] Run the test only with Python 3.9+ This PR proposes to run the tzinfo test only with Python 3.9+. This is a followup of https://github.com/apache/spark/pull/45308. To make the Python build passing with Python 3.8. It fails as below: ```python Starting test(pypy3): pyspark.sql.tests.test_arrow (temp output: /__w/spark/spark/python/target/605c2e61-b7c8-4898-ac7b-1d86f495bd4f/pypy3__pyspark.sql.tests.test_arrow__qrwyvw4l.log) Traceback (most recent call last): File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/__w/spark/spark/python/pyspark/sql/tests/test_arrow.py", line 26, in from zoneinfo import ZoneInfo ModuleNotFoundError: No module named 'zoneinfo' ``` https://github.com/apache/spark/actions/runs/8082492167/job/22083534905 No, test-only. CI in this PR should test it out. No, Closes #45324 from HyukjinKwon/SPARK-47202-followup2. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit 11e8ae42af9234adf56dc2f32b92e87697c777e4) Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/tests/connect/test_parity_arrow.py | 2 ++ python/pyspark/sql/tests/test_arrow.py| 5 - 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/python/pyspark/sql/tests/connect/test_parity_arrow.py b/python/pyspark/sql/tests/connect/test_parity_arrow.py index 55e689da8fdd..abcf839f0fc5 100644 --- a/python/pyspark/sql/tests/connect/test_parity_arrow.py +++ b/python/pyspark/sql/tests/connect/test_parity_arrow.py @@ -16,6 +16,7 @@ # import unittest +import sys from distutils.version import LooseVersion import pandas as pd @@ -136,6 +137,7 @@ class ArrowParityTests(ArrowTestsMixin, ReusedConnectTestCase, PandasOnSparkTest def test_toPandas_nested_timestamp(self): self.check_toPandas_nested_timestamp(True) +@unittest.skipIf(sys.version_info < (3, 9), "zoneinfo is available from Python 3.9+") def test_toPandas_timestmap_tzinfo(self): self.check_toPandas_timestmap_tzinfo(True) diff --git a/python/pyspark/sql/tests/test_arrow.py b/python/pyspark/sql/tests/test_arrow.py index 6e462d38d8e5..a18b777e 100644 --- a/python/pyspark/sql/tests/test_arrow.py +++ b/python/pyspark/sql/tests/test_arrow.py @@ -25,7 +25,7 @@ import warnings from distutils.version import LooseVersion from typing import cast from collections import namedtuple -from zoneinfo import ZoneInfo +import sys from pyspark import SparkContext, SparkConf from pyspark.sql import Row, SparkSession @@ -1092,6 +1092,7 @@ class ArrowTestsMixin: self.assertEqual(df.first(), expected) +@unittest.skipIf(sys.version_info < (3, 9), "zoneinfo is available from Python 3.9+") def test_toPandas_timestmap_tzinfo(self): for arrow_enabled in [True, False]: with self.subTest(arrow_enabled=arrow_enabled): @@ -1099,6 +1100,8 @@ class ArrowTestsMixin: def check_toPandas_timestmap_tzinfo(self, arrow_enabled): # SPARK-47202: Test timestamp with tzinfo in toPandas and createDataFrame +from zoneinfo import ZoneInfo + ts_tzinfo = datetime.datetime(2023, 1, 1, 0, 0, 0, tzinfo=ZoneInfo("America/Los_Angeles")) data = pd.DataFrame({"a": [ts_tzinfo]}) df = self.spark.createDataFrame(data) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (22f9a5a25304 -> a993f9dcd56f)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 22f9a5a25304 [SPARK-47235][CORE][TESTS] Disable `deleteRecursivelyUsingUnixNative` in Apple Silicon test env add a993f9dcd56f [SPARK-42627][SPARK-26494][SQL] Support Oracle TIMESTAMP WITH LOCAL TIME ZONE No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/OracleIntegrationSuite.scala| 27 ++ .../org/apache/spark/sql/jdbc/OracleDialect.scala | 13 --- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 4 +++- 3 files changed, 40 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org