date:20240229

(spark) branch master updated: [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv`

2024-02-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 944a00db6f83 [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` 
and `test_split_apply_adv`
944a00db6f83 is described below

commit 944a00db6f83b076624d5c00cd60dba5667b4e0b
Author: Ruifeng Zheng 
AuthorDate: Thu Feb 29 21:52:34 2024 +0900

[SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and 
`test_split_apply_adv`

### What changes were proposed in this pull request?
Split `test_split_apply_basic`/`test_split_apply_adv` and their parity tests

### Why are the changes needed?
it is still slow, split it for testing parallelism

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45332 from zhengruifeng/ps_test_split_apply_basic.

Authored-by: Ruifeng Zheng 
Signed-off-by: Hyukjin Kwon 
---
 dev/sparktestsupport/modules.py  | 16 
 ...t_apply_basic.py => test_parity_split_apply_count.py} |  8 
 ...lit_apply_adv.py => test_parity_split_apply_first.py} |  8 
 ...plit_apply_adv.py => test_parity_split_apply_last.py} |  8 
 ...plit_apply_adv.py => test_parity_split_apply_skew.py} |  8 
 ...split_apply_adv.py => test_parity_split_apply_std.py} |  8 
 ...split_apply_adv.py => test_parity_split_apply_var.py} |  8 
 ...st_split_apply_basic.py => test_split_apply_count.py} | 10 +-
 ...st_split_apply_basic.py => test_split_apply_first.py} | 10 +-
 ...est_split_apply_basic.py => test_split_apply_last.py} |  8 
 ...{test_split_apply_adv.py => test_split_apply_skew.py} | 10 +-
 .../{test_split_apply_adv.py => test_split_apply_std.py} | 10 +-
 .../{test_split_apply_adv.py => test_split_apply_var.py} | 10 +-
 13 files changed, 65 insertions(+), 57 deletions(-)

diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index 629e74650bea..b2c64a8242de 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -904,9 +904,13 @@ pyspark_pandas_slow = Module(
 "pyspark.pandas.tests.groupby.test_rank",
 "pyspark.pandas.tests.groupby.test_size",
 "pyspark.pandas.tests.groupby.test_split_apply",
-"pyspark.pandas.tests.groupby.test_split_apply_adv",
-"pyspark.pandas.tests.groupby.test_split_apply_basic",
+"pyspark.pandas.tests.groupby.test_split_apply_count",
+"pyspark.pandas.tests.groupby.test_split_apply_first",
+"pyspark.pandas.tests.groupby.test_split_apply_last",
 "pyspark.pandas.tests.groupby.test_split_apply_min_max",
+"pyspark.pandas.tests.groupby.test_split_apply_skew",
+"pyspark.pandas.tests.groupby.test_split_apply_std",
+"pyspark.pandas.tests.groupby.test_split_apply_var",
 "pyspark.pandas.tests.groupby.test_stat",
 "pyspark.pandas.tests.groupby.test_stat_adv",
 "pyspark.pandas.tests.groupby.test_stat_ddof",
@@ -1180,9 +1184,13 @@ pyspark_pandas_connect_part1 = Module(
 "pyspark.pandas.tests.connect.groupby.test_parity_cumulative",
 "pyspark.pandas.tests.connect.groupby.test_parity_missing_data",
 "pyspark.pandas.tests.connect.groupby.test_parity_split_apply",
-"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_adv",
-"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_basic",
+"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_count",
+"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_first",
+"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_last",
 "pyspark.pandas.tests.connect.groupby.test_parity_split_apply_min_max",
+"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_skew",
+"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_std",
+"pyspark.pandas.tests.connect.groupby.test_parity_split_apply_var",
 "pyspark.pandas.tests.connect.series.test_parity_datetime",
 "pyspark.pandas.tests.connect.series.test_parity_string_ops_adv",
 "pyspark.pandas.tests.connect.series.test_parity_string_ops_basic",
diff --git 
a/python/pyspark/pandas/tests/connect/groupby/test_parity_split_apply_basic.py 
b/python/pyspark/pandas/tests/connect/groupby/test_parity_split_apply_count.py
similarity index 87%
rename from 
python/pyspark/pandas/tests/connect/groupby/test_parity_split_apply_basic.py
rename to 
python/pyspark/pandas/tests/connect/groupby/test_parity_split_apply_count.py
index 2964213ab484..3e7931d1b5a0 100644
--- 
a/python/pyspark/pandas/tests/connect/groupby/test_parity_sp

(spark) branch master updated (944a00db6f83 -> a7825f6e8907)

2024-02-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 944a00db6f83 [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` 
and `test_split_apply_adv`
 add a7825f6e8907 [SPARK-47227][DOCS] Improve documentation for Spark 
Connect

No new revisions were added by this update.

Summary of changes:
 docs/spark-connect-overview.md | 20 
 1 file changed, 20 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (a7825f6e8907 -> 919c19c008b8)

2024-02-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a7825f6e8907 [SPARK-47227][DOCS] Improve documentation for Spark 
Connect
 add 919c19c008b8 [SPARK-47231][CORE][TESTS] FakeTask should reference its 
TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/scheduler/FakeTask.scala| 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0

2024-02-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 28fd3de0fea0 [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test 
dependencies to `sql/core` module for Hadoop 3.4.0
28fd3de0fea0 is described below

commit 28fd3de0fea0e952aa1494838d00185613389277
Author: yangjie01 
AuthorDate: Thu Feb 29 07:56:29 2024 -0800

[SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to 
`sql/core` module for Hadoop 3.4.0

### What changes were proposed in this pull request?

Adds bouncy-castle jdk18 artifacts to test builds in spark-sql.

Based on #38974
* only applies the test import changes
* dependencies are those of #44359

### Why are the changes needed?

Forthcoming Hadoop 3.4.0 release doesn't export the bouncy-castle
JARs; maven builds fail.

### Does this PR introduce _any_ user-facing change?

No: test time dependency declarations only.

### How was this patch tested?

This was done through the release build/test project
https://github.com/apache/hadoop-release-support

1. Latest RC2 artifacts pulled from apache maven staging
2. Spark maven build triggered with the hadoop-version passed down.
3. The 3.3.6 release template worked with spark master (as it should!)
4. With this change the 3.4.0 RC build worked with this change

Note: have not *yet* done a maven test run through this yet

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45317 from steveloughran/SPARK-41392-HADOOP-3.4.0.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 sql/core/pom.xml | 12 
 1 file changed, 12 insertions(+)

diff --git a/sql/core/pom.xml b/sql/core/pom.xml
index 8b1b51352a20..0ad9e0f690c7 100644
--- a/sql/core/pom.xml
+++ b/sql/core/pom.xml
@@ -223,6 +223,18 @@
   htmlunit3-driver
   test
 
+
+
+  org.bouncycastle
+  bcprov-jdk18on
+  test
+
+
+  org.bouncycastle
+  bcpkix-jdk18on
+  test
+
   
   
 
target/scala-${scala.binary.version}/classes


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (28fd3de0fea0 -> 813934c69df6)

2024-02-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 28fd3de0fea0 [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test 
dependencies to `sql/core` module for Hadoop 3.4.0
 add 813934c69df6 [SPARK-47015][SQL] Disable partitioning on collated 
columns

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json| 13 +-
 docs/sql-error-conditions.md   |  6 +
 .../spark/sql/errors/QueryCompilationErrors.scala  |  6 ++---
 .../execution/datasources/PartitioningUtils.scala  | 18 ++---
 .../spark/sql/execution/datasources/rules.scala| 14 +-
 .../org/apache/spark/sql/CollationSuite.scala  | 30 ++
 .../org/apache/spark/sql/SQLInsertTestSuite.scala  |  2 +-
 .../spark/sql/connector/AlterTableTests.scala  |  4 +--
 8 files changed, 71 insertions(+), 22 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (813934c69df6 -> 70007c59177a)

2024-02-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 813934c69df6 [SPARK-47015][SQL] Disable partitioning on collated 
columns
 add 70007c59177a [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for 
docker ITs

No new revisions were added by this update.

Summary of changes:
 .../sql/jdbc/DockerJDBCIntegrationSuite.scala  | 39 --
 1 file changed, 14 insertions(+), 25 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (70007c59177a -> 9ce43c85a5d2)

2024-02-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 70007c59177a [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for 
docker ITs
 add 9ce43c85a5d2 [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the 
never changed `var` to `val`

No new revisions were added by this update.

Summary of changes:
 .../sql/connect/planner/SparkConnectServiceSuite.scala   |  2 +-
 .../scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  |  2 +-
 .../spark/executor/CoarseGrainedExecutorBackend.scala|  2 +-
 core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala  |  2 +-
 .../org/apache/spark/resource/ResourceProfileSuite.scala |  2 +-
 .../apache/spark/scheduler/TaskSchedulerImplSuite.scala  |  2 +-
 .../apache/spark/shuffle/ShuffleBlockPusherSuite.scala   |  2 +-
 .../main/scala/org/apache/spark/deploy/yarn/Client.scala |  2 +-
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala|  2 +-
 .../spark/sql/catalyst/analysis/AnalysisErrorSuite.scala |  8 
 .../catalyst/expressions/StringExpressionsSuite.scala| 16 
 .../spark/sql/execution/streaming/state/RocksDB.scala|  2 +-
 .../scala/org/apache/spark/sql/ConfigBehaviorSuite.scala |  2 +-
 .../datasources/parquet/ParquetVectorizedSuite.scala |  2 +-
 .../sql/hive/thriftserver/HiveThriftServer2Suites.scala  |  2 +-
 15 files changed, 25 insertions(+), 25 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (9ce43c85a5d2 -> 7447172ffd5c)

2024-02-29 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9ce43c85a5d2 [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the 
never changed `var` to `val`
 add 7447172ffd5c [SPARK-47200][SS] Error class for Foreach batch sink user 
function error

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json|  6 +
 docs/sql-error-conditions.md   |  6 +
 .../streaming/test_streaming_foreach_batch.py  |  5 -
 .../streaming/sources/ForeachBatchSink.scala   | 18 ++-
 .../sql/errors/QueryExecutionErrorsSuite.scala |  2 +-
 .../streaming/sources/ForeachBatchSinkSuite.scala  | 26 ++
 6 files changed, 60 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (7447172ffd5c -> 32102bcbec87)

2024-02-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7447172ffd5c [SPARK-47200][SS] Error class for Foreach batch sink user 
function error
 add 32102bcbec87 [SPARK-47211][CONNECT][PYTHON] Fix ignored PySpark 
Connect string collation

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/types.py|  4 ++--
 python/pyspark/sql/tests/test_types.py | 12 
 2 files changed, 14 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer

2024-02-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cc0ea60d6eee [SPARK-47218][SQL] XML: Ignore commented row tags in XML 
tokenizer
cc0ea60d6eee is described below

commit cc0ea60d6b21fb5012512f0470cb669c911d
Author: Yousof Hosny 
AuthorDate: Fri Mar 1 09:53:42 2024 +0900

[SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer

### What changes were proposed in this pull request?

Added some logic to handle comments in XMLTokenizer. Comments are ignored 
anytime they appear. Anything enclosed in a leftmost `` will be ignored.

Does not handle nested comments:
` -->` will leave the last `-->` dangling.

Does not handle comments *within* row tags
- ` ...` will fail
- ` ...` will fail

These two cases are not considered valid XML according to the official spec 
(no nested comments, no comments within tags)

### Why are the changes needed?

This is a bug which results in incorrect data being parsed.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test covering different cases of where comments may be placed relative 
to row tags.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45325 from yhosny/xml-tokenizer.

Lead-authored-by: Yousof Hosny 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../spark/sql/catalyst/xml/StaxXmlParser.scala | 56 ++
 .../test-data/xml-resources/commented-row.xml  | 25 ++
 .../sql/execution/datasources/xml/XmlSuite.scala   | 10 
 3 files changed, 91 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
index 544a761219cd..a4d15695145d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
@@ -624,6 +624,8 @@ class XmlTokenizer(
   private var buffer = new StringBuilder()
   private val startTag = s"<${options.rowTag}>"
   private val endTag = s""
+  private val commentStart = s""
 
 /**
* Finds the start of the next record.
@@ -671,9 +673,35 @@ class XmlTokenizer(
   nextString
 }
 
+  private def readUntilCommentClose(): Boolean = {
+var i = 0
+while (true) {
+  val cOrEOF = reader.read()
+  if (cOrEOF == -1) {
+// End of file.
+return false
+  }
+  val c = cOrEOF.toChar
+  if (c == commentEnd(i)) {
+if (i >= commentEnd.length - 1) {
+  // Found comment close.
+  return true
+}
+i += 1
+  } else {
+i = 0
+  }
+}
+
+// Unreachable (a comment tag must close)
+false
+  }
+
   private def readUntilStartElement(): Boolean = {
 currentStartTag = startTag
 var i = 0
+var commentIdx = 0
+
 while (true) {
   val cOrEOF = reader.read()
   if (cOrEOF == -1) { // || (i == 0 && getFilePosition() > end)) {
@@ -681,6 +709,19 @@ class XmlTokenizer(
 return false
   }
   val c = cOrEOF.toChar
+
+  if (c == commentStart(commentIdx)) {
+if (commentIdx >= commentStart.length - 1) {
+  //  If a comment beigns we must ignore all character until its end
+  commentIdx = 0
+  readUntilCommentClose()
+} else {
+  commentIdx += 1
+}
+  } else {
+commentIdx = 0
+  }
+
   if (c == startTag(i)) {
 if (i >= startTag.length - 1) {
   // Found start tag.
@@ -708,6 +749,8 @@ class XmlTokenizer(
 // Index into the start or end tag that has matched so far
 var si = 0
 var ei = 0
+// Index into the start of a comment tag that matched so far
+var commentIdx = 0
 // How many other start tags enclose the one that's started already?
 var depth = 0
 // Previously read character
@@ -730,6 +773,19 @@ class XmlTokenizer(
   val c = cOrEOF.toChar
   buffer.append(c)
 
+  if (c == commentStart(commentIdx)) {
+if (commentIdx >= commentStart.length - 1) {
+  //  If a comment beigns we must ignore everything until its end
+  buffer.setLength(buffer.length - commentStart.length)
+  commentIdx = 0
+  readUntilCommentClose()
+} else {
+  commentIdx += 1
+}
+  } else {
+commentIdx = 0
+  }
+
   if (c == '>' && prevC != '/') {
 canSelfClose = false
   }
diff --git 
a/sql/core/src/test/resources/test-data/xml-resources/commented-row.xml 
b/sql/core/src/test/resources/test-data/xml

(spark) branch master updated (cc0ea60d6eee -> 1cd7bab5c5c2)

2024-02-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cc0ea60d6eee [SPARK-47218][SQL] XML: Ignore commented row tags in XML 
tokenizer
 add 1cd7bab5c5c2 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to 
skip non-existing file input

No new revisions were added by this update.

Summary of changes:
 common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input

2024-02-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 58a4a49389a5 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to 
skip non-existing file input
58a4a49389a5 is described below

commit 58a4a49389a5f9979f7dabc5320116a212eb4bdb
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 29 19:08:15 2024 -0800

[SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing 
file input

### What changes were proposed in this pull request?

This PR aims to fix `deleteRecursivelyUsingJavaIO` to skip non-existing 
file input.

### Why are the changes needed?

`deleteRecursivelyUsingJavaIO` is a fallback of 
`deleteRecursivelyUsingUnixNative`.
We should have identical capability. Currently, it fails.

```
[info]   java.nio.file.NoSuchFileException: 
/Users/dongjoon/APACHE/spark-merge/target/tmp/spark-e264d853-42c0-44a2-9a30-22049522b04f
[info]   at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
[info]   at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
[info]   at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
[info]   at 
java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
[info]   at 
java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
[info]   at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
[info]   at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:126)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This is difficult to test this `private static` Java method. I tested this 
with #45344 .

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45346 from dongjoon-hyun/SPARK-47236.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1cd7bab5c5c2bd8d595b131c88e6576486dbf123)
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/java/org/apache/spark/network/util/JavaUtils.java   | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
 
b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
index 7e410e9eab22..59744ec5748a 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
@@ -124,6 +124,7 @@ public class JavaUtils {
   private static void deleteRecursivelyUsingJavaIO(
   File file,
   FilenameFilter filter) throws IOException {
+if (!file.exists()) return;
 BasicFileAttributes fileAttributes =
   Files.readAttributes(file.toPath(), BasicFileAttributes.class);
 if (fileAttributes.isDirectory() && !isSymlink(file)) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input

2024-02-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 9770016b180b [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to 
skip non-existing file input
9770016b180b is described below

commit 9770016b180b0477060777d3739a2bfaabc6fcb3
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 29 19:08:15 2024 -0800

[SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing 
file input

### What changes were proposed in this pull request?

This PR aims to fix `deleteRecursivelyUsingJavaIO` to skip non-existing 
file input.

### Why are the changes needed?

`deleteRecursivelyUsingJavaIO` is a fallback of 
`deleteRecursivelyUsingUnixNative`.
We should have identical capability. Currently, it fails.

```
[info]   java.nio.file.NoSuchFileException: 
/Users/dongjoon/APACHE/spark-merge/target/tmp/spark-e264d853-42c0-44a2-9a30-22049522b04f
[info]   at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
[info]   at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
[info]   at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
[info]   at 
java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
[info]   at 
java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
[info]   at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
[info]   at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:126)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This is difficult to test this `private static` Java method. I tested this 
with #45344 .

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45346 from dongjoon-hyun/SPARK-47236.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1cd7bab5c5c2bd8d595b131c88e6576486dbf123)
Signed-off-by: Dongjoon Hyun 
---
 common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java 
b/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java
index bbe764b8366c..d6603dcbee1a 100644
--- a/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java
+++ b/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java
@@ -120,6 +120,7 @@ public class JavaUtils {
   private static void deleteRecursivelyUsingJavaIO(
   File file,
   FilenameFilter filter) throws IOException {
+if (!file.exists()) return;
 BasicFileAttributes fileAttributes =
   Files.readAttributes(file.toPath(), BasicFileAttributes.class);
 if (fileAttributes.isDirectory() && !isSymlink(file)) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-website) branch asf-site updated: MINOR: Update the link of latest to 3.5.1

2024-02-29 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 53982b743a MINOR: Update the link of latest to 3.5.1
53982b743a is described below

commit 53982b743a2834472f2f028e821a543355b359da
Author: Jungtaek Lim 
AuthorDate: Fri Mar 1 13:23:42 2024 +0900

MINOR: Update the link of latest to 3.5.1
---
 site/docs/latest | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/docs/latest b/site/docs/latest
index e5b820341f..3c8ff8c36b 12
--- a/site/docs/latest
+++ b/site/docs/latest
@@ -1 +1 @@
-3.5.0
\ No newline at end of file
+3.5.1
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (1cd7bab5c5c2 -> 22f9a5a25304)

2024-02-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1cd7bab5c5c2 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to 
skip non-existing file input
 add 22f9a5a25304 [SPARK-47235][CORE][TESTS] Disable 
`deleteRecursivelyUsingUnixNative` in Apple Silicon test env

No new revisions were added by this update.

Summary of changes:
 .../utils/src/main/java/org/apache/spark/network/util/JavaUtils.java  | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47202][PYTHON][TESTS][FOLLOW-UP] Run the test only with Python 3.9+

2024-02-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 899153039023 [SPARK-47202][PYTHON][TESTS][FOLLOW-UP] Run the test only 
with Python 3.9+
899153039023 is described below

commit 899153039023f2a12f43813028dcb54c9e880eda
Author: Hyukjin Kwon 
AuthorDate: Thu Feb 29 10:56:26 2024 +0900

[SPARK-47202][PYTHON][TESTS][FOLLOW-UP] Run the test only with Python 3.9+

This PR proposes to run the tzinfo test only with Python 3.9+. This is a 
followup of https://github.com/apache/spark/pull/45308.

To make the Python build passing with Python 3.8. It fails as below:

```python
Starting test(pypy3): pyspark.sql.tests.test_arrow (temp output: 
/__w/spark/spark/python/target/605c2e61-b7c8-4898-ac7b-1d86f495bd4f/pypy3__pyspark.sql.tests.test_arrow__qrwyvw4l.log)
Traceback (most recent call last):
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 197, in 
_run_module_as_main
return _run_code(code, main_globals, None,
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
  File "/__w/spark/spark/python/pyspark/sql/tests/test_arrow.py", line 26, 
in 
from zoneinfo import ZoneInfo
ModuleNotFoundError: No module named 'zoneinfo'
```

https://github.com/apache/spark/actions/runs/8082492167/job/22083534905

No, test-only.

CI in this PR should test it out.

No,

Closes #45324 from HyukjinKwon/SPARK-47202-followup2.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 11e8ae42af9234adf56dc2f32b92e87697c777e4)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/connect/test_parity_arrow.py | 2 ++
 python/pyspark/sql/tests/test_arrow.py| 5 -
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/connect/test_parity_arrow.py 
b/python/pyspark/sql/tests/connect/test_parity_arrow.py
index 55e689da8fdd..abcf839f0fc5 100644
--- a/python/pyspark/sql/tests/connect/test_parity_arrow.py
+++ b/python/pyspark/sql/tests/connect/test_parity_arrow.py
@@ -16,6 +16,7 @@
 #
 
 import unittest
+import sys
 from distutils.version import LooseVersion
 
 import pandas as pd
@@ -136,6 +137,7 @@ class ArrowParityTests(ArrowTestsMixin, 
ReusedConnectTestCase, PandasOnSparkTest
 def test_toPandas_nested_timestamp(self):
 self.check_toPandas_nested_timestamp(True)
 
+@unittest.skipIf(sys.version_info < (3, 9), "zoneinfo is available from 
Python 3.9+")
 def test_toPandas_timestmap_tzinfo(self):
 self.check_toPandas_timestmap_tzinfo(True)
 
diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index 6e462d38d8e5..a18b777e 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -25,7 +25,7 @@ import warnings
 from distutils.version import LooseVersion
 from typing import cast
 from collections import namedtuple
-from zoneinfo import ZoneInfo
+import sys
 
 from pyspark import SparkContext, SparkConf
 from pyspark.sql import Row, SparkSession
@@ -1092,6 +1092,7 @@ class ArrowTestsMixin:
 
 self.assertEqual(df.first(), expected)
 
+@unittest.skipIf(sys.version_info < (3, 9), "zoneinfo is available from 
Python 3.9+")
 def test_toPandas_timestmap_tzinfo(self):
 for arrow_enabled in [True, False]:
 with self.subTest(arrow_enabled=arrow_enabled):
@@ -1099,6 +1100,8 @@ class ArrowTestsMixin:
 
 def check_toPandas_timestmap_tzinfo(self, arrow_enabled):
 # SPARK-47202: Test timestamp with tzinfo in toPandas and 
createDataFrame
+from zoneinfo import ZoneInfo
+
 ts_tzinfo = datetime.datetime(2023, 1, 1, 0, 0, 0, 
tzinfo=ZoneInfo("America/Los_Angeles"))
 data = pd.DataFrame({"a": [ts_tzinfo]})
 df = self.spark.createDataFrame(data)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (22f9a5a25304 -> a993f9dcd56f)

2024-02-29 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 22f9a5a25304 [SPARK-47235][CORE][TESTS] Disable 
`deleteRecursivelyUsingUnixNative` in Apple Silicon test env
 add a993f9dcd56f [SPARK-42627][SPARK-26494][SQL] Support Oracle TIMESTAMP 
WITH LOCAL TIME ZONE

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/OracleIntegrationSuite.scala| 27 ++
 .../org/apache/spark/sql/jdbc/OracleDialect.scala  | 13 ---
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |  4 +++-
 3 files changed, 40 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv`

(spark) branch master updated (944a00db6f83 -> a7825f6e8907)

(spark) branch master updated (a7825f6e8907 -> 919c19c008b8)

(spark) branch master updated: [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0

(spark) branch master updated (28fd3de0fea0 -> 813934c69df6)

(spark) branch master updated (813934c69df6 -> 70007c59177a)

(spark) branch master updated (70007c59177a -> 9ce43c85a5d2)

(spark) branch master updated (9ce43c85a5d2 -> 7447172ffd5c)

(spark) branch master updated (7447172ffd5c -> 32102bcbec87)

(spark) branch master updated: [SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer

(spark) branch master updated (cc0ea60d6eee -> 1cd7bab5c5c2)

(spark) branch branch-3.4 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input

(spark) branch branch-3.5 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input

(spark-website) branch asf-site updated: MINOR: Update the link of latest to 3.5.1

(spark) branch master updated (1cd7bab5c5c2 -> 22f9a5a25304)

(spark) branch branch-3.5 updated: [SPARK-47202][PYTHON][TESTS][FOLLOW-UP] Run the test only with Python 3.9+

(spark) branch master updated (22f9a5a25304 -> a993f9dcd56f)

17 matches

Site Navigation

Mail list logo

Footer information