date:20230925

[spark] branch master updated: [SPARK-45333][CORE] Fix one unit mistake related to spark.eventLog.buffer.kb

2023-09-25 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 91b76070b05f [SPARK-45333][CORE] Fix one unit mistake related to 
spark.eventLog.buffer.kb
91b76070b05f is described below

commit 91b76070b05fdf026a41f22c12e404ee99bd8cc3
Author: lanmengran1 
AuthorDate: Tue Sep 26 14:56:51 2023 +0900

[SPARK-45333][CORE] Fix one unit mistake related to spark.eventLog.buffer.kb

### What changes were proposed in this pull request?

Fixing a unit mistake in the usage of configuration 
"spark.eventLog.buffer.kb"

### Why are the changes needed?

Making the size of the event log output buffer as expected

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Maybe don't need

Closes #42294 from amoylan2/fix_unit_mistake_in_eventLogOutputBufferSize.

Authored-by: lanmengran1 
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/deploy/history/EventLogFileWriters.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala
index 144dadf29bc3..e7eb05c85367 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala
@@ -57,7 +57,7 @@ abstract class EventLogFileWriter(
   protected val shouldCompress = sparkConf.get(EVENT_LOG_COMPRESS) &&
   !sparkConf.get(EVENT_LOG_COMPRESSION_CODEC).equalsIgnoreCase("none")
   protected val shouldOverwrite = sparkConf.get(EVENT_LOG_OVERWRITE)
-  protected val outputBufferSize = 
sparkConf.get(EVENT_LOG_OUTPUT_BUFFER_SIZE).toInt
+  protected val outputBufferSize = 
sparkConf.get(EVENT_LOG_OUTPUT_BUFFER_SIZE).toInt * 1024
   protected val fileSystem = Utils.getHadoopFileSystem(logBaseDir, hadoopConf)
   protected val compressionCodec =
 if (shouldCompress) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r64200 - /release/spark/spark-3.2.4/

2023-09-25 Thread dongjoon

Author: dongjoon
Date: Tue Sep 26 04:26:46 2023
New Revision: 64200

Log:
Remove Apache Spark 3.2.4 due to the end-of-life

Removed:
release/spark/spark-3.2.4/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r64199 - /release/spark/spark-3.4.0/

2023-09-25 Thread dongjoon

Author: dongjoon
Date: Tue Sep 26 04:25:22 2023
New Revision: 64199

Log:
Remove Apache Spark 3.4.0 after uploading 3.4.1

Removed:
release/spark/spark-3.4.0/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r64198 - in /dev/spark: v3.5.0-rc1-bin/ v3.5.0-rc1-docs/ v3.5.0-rc2-bin/ v3.5.0-rc2-docs/ v3.5.0-rc3-bin/ v3.5.0-rc3-docs/ v3.5.0-rc4-bin/ v3.5.0-rc4-docs/

2023-09-25 Thread dongjoon

Author: dongjoon
Date: Tue Sep 26 04:24:47 2023
New Revision: 64198

Log:
Remove Apache Spark 3.5.0 RC artifacts after releasing

Removed:
dev/spark/v3.5.0-rc1-bin/
dev/spark/v3.5.0-rc1-docs/
dev/spark/v3.5.0-rc2-bin/
dev/spark/v3.5.0-rc2-docs/
dev/spark/v3.5.0-rc3-bin/
dev/spark/v3.5.0-rc3-docs/
dev/spark/v3.5.0-rc4-bin/
dev/spark/v3.5.0-rc4-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r64197 - /dev/spark/v3.4.1-rc1-docs/

2023-09-25 Thread dongjoon

Author: dongjoon
Date: Tue Sep 26 04:23:28 2023
New Revision: 64197

Log:
Remove Apache Spark 3.4.1 RC1 doc after releasing

Removed:
dev/spark/v3.4.1-rc1-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45248][CORE] Set the timeout for spark ui server

2023-09-25 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 273a375cd314 [SPARK-45248][CORE] Set the timeout for spark ui server
273a375cd314 is described below

commit 273a375cd314fbf52b5f2538526374f6b24fb2cf
Author: chenyu <119398199+chenyu-opensou...@users.noreply.github.com>
AuthorDate: Mon Sep 25 22:38:27 2023 -0500

[SPARK-45248][CORE] Set the timeout for spark ui server

**What changes were proposed in this pull request?**
The PR supports to set the timeout for spark ui server.

**Why are the changes needed?**
It can avoid slow HTTP Denial of Service Attack because the jetty server's 
timeout is 30 for deafult.

**Does this PR introduce any user-facing change?**
No

**How was this patch tested?**
Manual review

**Was this patch authored or co-authored using generative AI tooling?**
No

Closes #43078 from chenyu-opensource/branch-SPARK-45248-new.

Authored-by: chenyu <119398199+chenyu-opensou...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index 9582bdbf5264..22adcbc32ed8 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -296,6 +296,8 @@ private[spark] object JettyUtils extends Logging {
 connector.setPort(port)
 connector.setHost(hostName)
 connector.setReuseAddress(!Utils.isWindows)
+ // spark-45248: set the idle timeout to prevent slow DoS
+connector.setIdleTimeout(8000)
 
 // Currently we only use "SelectChannelConnector"
 // Limit the max acceptor number to 8 so that we don't waste a lot of 
threads


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45299][TESTS] Remove JDK 8 workaround in UtilsSuite

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d49780037b51 [SPARK-45299][TESTS] Remove JDK 8 workaround in UtilsSuite
d49780037b51 is described below

commit d49780037b5169d478a505ce9e637234f3eadb67
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 25 20:17:43 2023 -0700

[SPARK-45299][TESTS] Remove JDK 8 workaround in UtilsSuite

### What changes were proposed in this pull request?

This PR removes the legacy workaround for JDK 7 and below at SPARK-12486. 
The main code was cleaned up at SPARK-16182 but the test code was not cleaned 
up.

### Why are the changes needed?

To remove legacy workaround.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Fixed unittests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43084 from HyukjinKwon/SPARK-45299.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/util/UtilsSuite.scala   | 75 ++
 1 file changed, 33 insertions(+), 42 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 2a91b45ef5b7..58ce15cfaf81 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -32,7 +32,7 @@ import scala.util.Random
 
 import com.google.common.io.Files
 import org.apache.commons.io.IOUtils
-import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.apache.commons.lang3.SystemUtils
 import org.apache.commons.math3.stat.inference.ChiSquareTest
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
@@ -983,19 +983,13 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties {
 // Verify that we can terminate a process even if it is in a bad state. 
This is only run
 // on UNIX since it does some OS specific things to verify the correct 
behavior.
 if (SystemUtils.IS_OS_UNIX) {
-  def getPid(p: Process): Int = {
-val f = p.getClass().getDeclaredField("pid")
-f.setAccessible(true)
-f.get(p).asInstanceOf[Int]
-  }
-
-  def pidExists(pid: Int): Boolean = {
+  def pidExists(pid: Long): Boolean = {
 val p = Runtime.getRuntime.exec(Array("kill", "-0", s"$pid"))
 p.waitFor()
 p.exitValue() == 0
   }
 
-  def signal(pid: Int, s: String): Unit = {
+  def signal(pid: Long, s: String): Unit = {
 val p = Runtime.getRuntime.exec(Array("kill", s"-$s", s"$pid"))
 p.waitFor()
   }
@@ -1003,8 +997,8 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties {
   // Start up a process that runs 'sleep 10'. Terminate the process and 
assert it takes
   // less time and the process is no longer there.
   val startTimeNs = System.nanoTime()
-  val process = new ProcessBuilder("sleep", "10").start()
-  val pid = getPid(process)
+  var process = new ProcessBuilder("sleep", "10").start()
+  var pid = process.toHandle.pid()
   try {
 assert(pidExists(pid))
 val terminated = Utils.terminateProcess(process, 5000)
@@ -1018,37 +1012,34 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties {
 signal(pid, "SIGKILL")
   }
 
-  if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_1_8)) {
-// We'll make sure that forcibly terminating a process works by
-// creating a very misbehaving process. It ignores SIGTERM and has 
been SIGSTOPed. On
-// older versions of java, this will *not* terminate.
-val file = File.createTempFile("temp-file-name", ".tmp")
-file.deleteOnExit()
-val cmd =
-  s"""
- |#!/usr/bin/env bash
- |trap "" SIGTERM
- |sleep 10
-   """.stripMargin
-Files.write(cmd.getBytes(UTF_8), file)
-file.getAbsoluteFile.setExecutable(true)
-
-val process = new ProcessBuilder(file.getAbsolutePath).start()
-val pid = getPid(process)
-assert(pidExists(pid))
-try {
-  signal(pid, "SIGSTOP")
-  val startNs = System.nanoTime()
-  val terminated = Utils.terminateProcess(process, 5000)
-  assert(terminated.isDefined)
-  process.waitFor(5, TimeUnit.SECONDS)
-  val duration = System.nanoTime() - startNs
-  // add a little extra time to allow a force kill to finish
-  assert(duration < TimeUnit.SECONDS.toNanos(6))
-  assert(!pidExists(pid))
-} finally {
-  signal(pid, "SIGKILL")
-}
+  //

[spark] branch master updated: [SPARK-45300][SQL][TESTS] Remove JDK 8 workaround in TimestampFormatterSuite

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 933f17d13f6e [SPARK-45300][SQL][TESTS] Remove JDK 8 workaround in 
TimestampFormatterSuite
933f17d13f6e is described below

commit 933f17d13f6e72a687a383b8fc1797a9ba700a98
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 25 20:15:03 2023 -0700

[SPARK-45300][SQL][TESTS] Remove JDK 8 workaround in TimestampFormatterSuite

### What changes were proposed in this pull request?

This PR removes the legacy workaround for JDK 8 in 
https://github.com/apache/spark/pull/28736.

### Why are the changes needed?

- We still need the main code for completeness, and in case there are other 
diff in the future JDK versions so this PR only fixes the tests.
- We dropped JDK 11/8 at SPARK-44112

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Fixed unittests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43085 from HyukjinKwon/SPARK-45300.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .../sql/catalyst/util/TimestampFormatterSuite.scala  | 20 ++--
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
index eb173bc7f8c8..ecd849dd3af9 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
@@ -19,8 +19,6 @@ package org.apache.spark.sql.catalyst.util
 
 import java.time.{DateTimeException, LocalDateTime}
 
-import org.apache.commons.lang3.{JavaVersion, SystemUtils}
-
 import org.apache.spark.SparkUpgradeException
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.catalyst.util.DateTimeUtils._
@@ -333,14 +331,8 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
   val micros1 = formatter.parse("2009-12-12 00 am")
   assert(micros1 === date(2009, 12, 12))
 
-  // JDK-8223773: DateTimeFormatter Fails to throw an Exception on Invalid 
HOUR_OF_AMPM
   // For `KK`, "12:00:00 am" is the same as "00:00:00 pm".
-  if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_13)) {
-intercept[DateTimeException](formatter.parse("2009-12-12 12 am"))
-  } else {
-val micros2 = formatter.parse("2009-12-12 12 am")
-assert(micros2 === date(2009, 12, 12, 12))
-  }
+  intercept[DateTimeException](formatter.parse("2009-12-12 12 am"))
 
   val micros3 = formatter.parse("2009-12-12 00 pm")
   assert(micros3 === date(2009, 12, 12, 12))
@@ -410,15 +402,7 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
 val formatter = TimestampFormatter("DD", UTC, isParsing = false)
 assert(formatter.format(date(1970, 1, 3)) == "03")
 assert(formatter.format(date(1970, 4, 9)) == "99")
-
-if (SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_1_8)) {
-  // https://bugs.openjdk.java.net/browse/JDK-8079628
-  intercept[SparkUpgradeException] {
-formatter.format(date(1970, 4, 10))
-  }
-} else {
-  assert(formatter.format(date(1970, 4, 10)) == "100")
-}
+assert(formatter.format(date(1970, 4, 10)) == "100")
   }
 
   test("SPARK-32424: avoid silent data change when timestamp overflows") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45182][CORE] Ignore task completion from old stage after retrying indeterminate stages

2023-09-25 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 252970bab65d [SPARK-45182][CORE] Ignore task completion from old stage 
after retrying indeterminate stages
252970bab65d is described below

commit 252970bab65d80020bae5f86f35d29a75fe54804
Author: mayurb 
AuthorDate: Tue Sep 26 11:04:07 2023 +0800

[SPARK-45182][CORE] Ignore task completion from old stage after retrying 
indeterminate stages

### What changes were proposed in this pull request?
[SPARK-25342](https://issues.apache.org/jira/browse/SPARK-25342) Added a 
support for rolling back shuffle map stage so that all tasks of the stage can 
be retried when the stage output is indeterminate. This is done by clearing all 
map outputs at the time of stage submission. This approach workouts well except 
for this case:

Assume both Shuffle 1 and 2 are indeterminate

ShuffleMapStage1 ––> Shuffle 1 ---–> ShuffleMapStage2 > Shuffle 2 > 
ResultStage

- ShuffleMapStage1 is complete
- A task from ShuffleMapStage2 fails with FetchFailed. Other tasks are 
still running
- Both ShuffleMapStage1 and ShuffleMapStage2 are retried
- ShuffleMapStage1 is retried and completes
- ShuffleMapStage2 reattempt is scheduled for execution
- Before all tasks of ShuffleMapStage2 reattempt could finish, one/more 
laggard tasks from the original attempt of ShuffleMapStage2 finish and 
ShuffleMapStage2 also gets marked as complete
- Result Stage gets scheduled and finishes

After this change, such laggard tasks from the old attempt of the 
indeterminate stage will be ignored

### Why are the changes needed?
This can give wrong result when indeterminate stages needs to be retried 
under the circumstances mentioned above

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
A new test case

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42950 from mayurdb/rollbackFix.

Authored-by: mayurb 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 7ffc0b71aa3e416a9b21e0975a169b2a8a8403a8)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/scheduler/DAGScheduler.scala  |  29 +++---
 .../apache/spark/scheduler/DAGSchedulerSuite.scala | 104 +
 2 files changed, 122 insertions(+), 11 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
index fc83439454dc..d73bb6339015 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -1903,19 +1903,26 @@ private[spark] class DAGScheduler(
 
   case smt: ShuffleMapTask =>
 val shuffleStage = stage.asInstanceOf[ShuffleMapStage]
-shuffleStage.pendingPartitions -= task.partitionId
-val status = event.result.asInstanceOf[MapStatus]
-val execId = status.location.executorId
-logDebug("ShuffleMapTask finished on " + execId)
-if (executorFailureEpoch.contains(execId) &&
+// Ignore task completion for old attempt of indeterminate stage
+val ignoreIndeterminate = stage.isIndeterminate &&
+  task.stageAttemptId < stage.latestInfo.attemptNumber()
+if (!ignoreIndeterminate) {
+  shuffleStage.pendingPartitions -= task.partitionId
+  val status = event.result.asInstanceOf[MapStatus]
+  val execId = status.location.executorId
+  logDebug("ShuffleMapTask finished on " + execId)
+  if (executorFailureEpoch.contains(execId) &&
 smt.epoch <= executorFailureEpoch(execId)) {
-  logInfo(s"Ignoring possibly bogus $smt completion from executor 
$execId")
+logInfo(s"Ignoring possibly bogus $smt completion from 
executor $execId")
+  } else {
+// The epoch of the task is acceptable (i.e., the task was 
launched after the most
+// recent failure we're aware of for the executor), so mark 
the task's output as
+// available.
+mapOutputTracker.registerMapOutput(
+  shuffleStage.shuffleDep.shuffleId, smt.partitionId, status)
+  }
 } else {
-  // The epoch of the task is acceptable (i.e., the task was 
launched after the most
-  // recent failure we're aware of for the executor), so mark the 
task's output as
-  // available.
-  mapOutputTracker.registerMapOutput(
-shuffleStage.shuffleDep.shuffleId, smt.partitionId, status)
+

[spark] branch master updated: [SPARK-45182][CORE] Ignore task completion from old stage after retrying indeterminate stages

2023-09-25 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7ffc0b71aa3e [SPARK-45182][CORE] Ignore task completion from old stage 
after retrying indeterminate stages
7ffc0b71aa3e is described below

commit 7ffc0b71aa3e416a9b21e0975a169b2a8a8403a8
Author: mayurb 
AuthorDate: Tue Sep 26 11:04:07 2023 +0800

[SPARK-45182][CORE] Ignore task completion from old stage after retrying 
indeterminate stages

### What changes were proposed in this pull request?
[SPARK-25342](https://issues.apache.org/jira/browse/SPARK-25342) Added a 
support for rolling back shuffle map stage so that all tasks of the stage can 
be retried when the stage output is indeterminate. This is done by clearing all 
map outputs at the time of stage submission. This approach workouts well except 
for this case:

Assume both Shuffle 1 and 2 are indeterminate

ShuffleMapStage1 ––> Shuffle 1 ---–> ShuffleMapStage2 > Shuffle 2 > 
ResultStage

- ShuffleMapStage1 is complete
- A task from ShuffleMapStage2 fails with FetchFailed. Other tasks are 
still running
- Both ShuffleMapStage1 and ShuffleMapStage2 are retried
- ShuffleMapStage1 is retried and completes
- ShuffleMapStage2 reattempt is scheduled for execution
- Before all tasks of ShuffleMapStage2 reattempt could finish, one/more 
laggard tasks from the original attempt of ShuffleMapStage2 finish and 
ShuffleMapStage2 also gets marked as complete
- Result Stage gets scheduled and finishes

After this change, such laggard tasks from the old attempt of the 
indeterminate stage will be ignored

### Why are the changes needed?
This can give wrong result when indeterminate stages needs to be retried 
under the circumstances mentioned above

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
A new test case

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42950 from mayurdb/rollbackFix.

Authored-by: mayurb 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/scheduler/DAGScheduler.scala  |  29 +++---
 .../apache/spark/scheduler/DAGSchedulerSuite.scala | 104 +
 2 files changed, 122 insertions(+), 11 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
index 8a1480fd2100..a456f91d4c96 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -1903,19 +1903,26 @@ private[spark] class DAGScheduler(
 
   case smt: ShuffleMapTask =>
 val shuffleStage = stage.asInstanceOf[ShuffleMapStage]
-shuffleStage.pendingPartitions -= task.partitionId
-val status = event.result.asInstanceOf[MapStatus]
-val execId = status.location.executorId
-logDebug("ShuffleMapTask finished on " + execId)
-if (executorFailureEpoch.contains(execId) &&
+// Ignore task completion for old attempt of indeterminate stage
+val ignoreIndeterminate = stage.isIndeterminate &&
+  task.stageAttemptId < stage.latestInfo.attemptNumber()
+if (!ignoreIndeterminate) {
+  shuffleStage.pendingPartitions -= task.partitionId
+  val status = event.result.asInstanceOf[MapStatus]
+  val execId = status.location.executorId
+  logDebug("ShuffleMapTask finished on " + execId)
+  if (executorFailureEpoch.contains(execId) &&
 smt.epoch <= executorFailureEpoch(execId)) {
-  logInfo(s"Ignoring possibly bogus $smt completion from executor 
$execId")
+logInfo(s"Ignoring possibly bogus $smt completion from 
executor $execId")
+  } else {
+// The epoch of the task is acceptable (i.e., the task was 
launched after the most
+// recent failure we're aware of for the executor), so mark 
the task's output as
+// available.
+mapOutputTracker.registerMapOutput(
+  shuffleStage.shuffleDep.shuffleId, smt.partitionId, status)
+  }
 } else {
-  // The epoch of the task is acceptable (i.e., the task was 
launched after the most
-  // recent failure we're aware of for the executor), so mark the 
task's output as
-  // available.
-  mapOutputTracker.registerMapOutput(
-shuffleStage.shuffleDep.shuffleId, smt.partitionId, status)
+  logInfo(s"Ignoring $smt completion from an older attempt of 
indeterminate stage")
 }

[spark] branch master updated: [SPARK-45318][SHELL][TESTS] Merge test cases from `SingletonRepl2Suite/Repl2Suite` back into `SingletonReplSuite/ReplSuite`

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 96b591abdba5 [SPARK-45318][SHELL][TESTS] Merge test cases from 
`SingletonRepl2Suite/Repl2Suite` back into `SingletonReplSuite/ReplSuite`
96b591abdba5 is described below

commit 96b591abdba58be1e3cb38c8c19885f6ceb17fd1
Author: yangjie01 
AuthorDate: Mon Sep 25 20:02:10 2023 -0700

[SPARK-45318][SHELL][TESTS] Merge test cases from 
`SingletonRepl2Suite/Repl2Suite` back into `SingletonReplSuite/ReplSuite`

### What changes were proposed in this pull request?
This pr aims to merge test cases from `SingletonRepl2Suite/Repl2Suite` back 
into `SingletonReplSuite/ReplSuite` to reduce duplicate code.

### Why are the changes needed?
https://github.com/apache/spark/pull/28545 split the relevant test cases 
from `SingletonReplSuite/ReplSuite` into `SingletonRepl2Suite/Repl2Suite`, 
distinguishing different test versions of Scala 2.12 and Scala 2.13.

Currently, Spark 4.0 no longer supports Scala 2.12, so they can be merged 
back into the original files to reduce duplicate code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43104 from LuciferYang/SPARK-45318.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/repl/Repl2Suite.scala   |  46 --
 .../scala/org/apache/spark/repl/ReplSuite.scala|  24 ++-
 .../apache/spark/repl/SingletonRepl2Suite.scala| 171 -
 .../org/apache/spark/repl/SingletonReplSuite.scala |  65 
 4 files changed, 88 insertions(+), 218 deletions(-)

diff --git a/repl/src/test/scala/org/apache/spark/repl/Repl2Suite.scala 
b/repl/src/test/scala/org/apache/spark/repl/Repl2Suite.scala
deleted file mode 100644
index d55ac91e466f..
--- a/repl/src/test/scala/org/apache/spark/repl/Repl2Suite.scala
+++ /dev/null
@@ -1,46 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.repl
-
-import java.io._
-
-import org.apache.spark.{SparkContext, SparkFunSuite}
-
-class Repl2Suite extends SparkFunSuite {
-  test("propagation of local properties") {
-// A mock ILoop that doesn't install the SIGINT handler.
-class ILoop(out: PrintWriter) extends SparkILoop(null, out)
-
-val out = new StringWriter()
-Main.interp = new ILoop(new PrintWriter(out))
-Main.sparkContext = new SparkContext("local", "repl-test")
-val settings = new scala.tools.nsc.Settings
-settings.usejavacp.value = true
-Main.interp.createInterpreter(settings)
-
-Main.sparkContext.setLocalProperty("someKey", "someValue")
-
-// Make sure the value we set in the caller to interpret is propagated in 
the thread that
-// interprets the command.
-
Main.interp.interpret("org.apache.spark.repl.Main.sparkContext.getLocalProperty(\"someKey\")")
-assert(out.toString.contains("someValue"))
-
-Main.sparkContext.stop()
-System.clearProperty("spark.driver.port")
-  }
-}
diff --git a/repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala 
b/repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala
index bb2a85cfa0de..b9f44a707465 100644
--- a/repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala
+++ b/repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -23,7 +23,7 @@ import java.nio.file.Files
 import org.apache.logging.log4j.{Level, LogManager}
 import org.apache.logging.log4j.core.{Logger, LoggerContext}
 
-import org.apache.spark.SparkFunSuite
+import org.apache.spark.{SparkContext, SparkFunSuite}
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.internal.StaticSQLConf.CATALOG_IMPLEMENTATION
@@ -398,4 +398,26 @@ class ReplSuite extends SparkFunSuite {
 assertContains(infoLogMessage2, out)
 assertContains(debugLogMessage1, out)
   }
+
+  test("propagation of local

[spark] branch master updated: [SPARK-45312][SQL][UI] Support toggle display/hide plan svg on execution page

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b1a0d6703bf0 [SPARK-45312][SQL][UI] Support toggle display/hide plan 
svg on execution page
b1a0d6703bf0 is described below

commit b1a0d6703bf0eb7609b394a41463ef5d02937223
Author: Kent Yao 
AuthorDate: Mon Sep 25 19:59:58 2023 -0700

[SPARK-45312][SQL][UI] Support toggle display/hide plan svg on execution 
page

### What changes were proposed in this pull request?

This PR supports toggle display/hide plan svg on the execution page.

### Why are the changes needed?

Improve UX for the execution page, especially for large plans

### Does this PR introduce _any_ user-facing change?

yes, UI changes

### How was this patch tested?

tested locally


https://github.com/apache/spark/assets/8326978/e8b7573a-20b6-4a7d-9542-b1dd62bb04db

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43099 from yaooqinn/SPARK-45312.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/ui/static/spark-sql-viz.js | 12 
 .../apache/spark/sql/execution/ui/ExecutionPage.scala  | 18 +-
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js
 
b/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js
index ea42877924d4..8999d6ff1fed 100644
--- 
a/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js
+++ 
b/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js
@@ -257,3 +257,15 @@ function onClickAdditionalMetricsCheckbox(checkboxNode) {
   }
   window.localStorage.setItem("stageId-and-taskId-checked", isChecked);
 }
+
+function togglePlanViz() {
+  const arrow = d3.select("#plan-viz-graph-arrow");
+  arrow.each(function () {
+$(this).toggleClass("arrow-open").toggleClass("arrow-closed")
+  });
+  if (arrow.classed("arrow-open")) {
+planVizContainer().style("display", "block");
+  } else {
+planVizContainer().style("display", "none");
+  }
+}
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala
index aa8fd261c58f..d1aefdb3463f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala
@@ -77,10 +77,6 @@ class ExecutionPage(parent: SQLTab) extends 
WebUIPage("execution") with Logging
 {jobLinks(JobExecutionStatus.FAILED, "Failed Jobs:")}
   
 
-
-  
-  Show the Stage ID and Task ID that corresponds to the max 
metric
-
 
   val metrics = sqlStore.executionMetrics(executionId)
   val graph = sqlStore.planGraph(executionId)
@@ -117,7 +113,19 @@ class ExecutionPage(parent: SQLTab) extends 
WebUIPage("execution") with Logging
   graph: SparkPlanGraph): Seq[Node] = {
 
 
-  
+  
+
+  
+  Plan Visualization
+
+  
+
+  
+
+  
+  Show the Stage ID and Task ID that corresponds to the max 
metric
+
+  
   
 
   {graph.makeDotFile(metrics)}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fc5342314c8d -> 47bad35a4da5)

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fc5342314c8d [SPARK-44126][CORE] Shuffle migration failure count 
should not increase when target executor decommissioned
 add 47bad35a4da5 [SPARK-45325][BUILD] Upgrade Avro to 1.11.3

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++---
 pom.xml   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44126][CORE] Shuffle migration failure count should not increase when target executor decommissioned

2023-09-25 Thread wuyi

This is an automated email from the ASF dual-hosted git repository.

wuyi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fc5342314c8d [SPARK-44126][CORE] Shuffle migration failure count 
should not increase when target executor decommissioned
fc5342314c8d is described below

commit fc5342314c8d0890cf97a808bcf5fdf3720a5864
Author: Warren Zhu 
AuthorDate: Tue Sep 26 10:52:40 2023 +0800

[SPARK-44126][CORE] Shuffle migration failure count should not increase 
when target executor decommissioned

### What changes were proposed in this pull request?
Do not increase shuffle migration failure count when target executor 
decommissioned

### Why are the changes needed?
Block manager decommissioner only sync with block manager master about live 
peers every `spark.storage.cachedPeersTtl`(default 60s). If some block manager 
decommissioned between this, it still try to migrated shuffle to such 
decommissioned block manger. The migration will be failed with 
RuntimeException("BlockSavedOnDecommissionedBlockManagerException"). Detailed 
stack trace as below:

```
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at 
org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122)
at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$5(BlockManagerDecommissioner.scala:127)
at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$5$adapted(BlockManagerDecommissioner.scala:118)
at scala.collection.immutable.List.foreach(List.scala:431)
at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:118)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: 
org.apache.spark.storage.BlockSavedOnDecommissionedBlockManagerException: Block 
shuffle_2_6429_0.data cannot be saved on decommissioned executor
at 
org.apache.spark.errors.SparkCoreErrors$.cannotSaveBlockOnDecommissionedExecutorError(SparkCoreErrors.scala:238)
at 
org.apache.spark.storage.BlockManager.checkShouldStore(BlockManager.scala:277)
at 
org.apache.spark.storage.BlockManager.putBlockDataAsStream(BlockManager.scala:741)
at 
org.apache.spark.network.netty.NettyBlockRpcServer.receiveStream(NettyBlockRpcServer.scala:174)

```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT in `BlockManagerDecommissionUnitSuite`

Closes #41905 from warrenzhu25/migrate-decom.

Authored-by: Warren Zhu 
Signed-off-by: Yi Wu 
---
 .../spark/storage/BlockManagerDecommissioner.scala | 16 -
 .../BlockManagerDecommissionUnitSuite.scala| 40 ++
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
index 59d1f3b4c4ba..cbac3fd1a994 100644
--- 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
+++ 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
@@ -43,6 +43,8 @@ private[storage] class BlockManagerDecommissioner(
   private val fallbackStorage = FallbackStorage.getFallbackStorage(conf)
   private val maxReplicationFailuresForDecommission =
 conf.get(config.STORAGE_DECOMMISSION_MAX_REPLICATION_FAILURE_PER_BLOCK)
+  private val blockSavedOnDecommissionedBlockManagerException =
+classOf[BlockSavedOnDecommissionedBlockManagerException].getSimpleName
 
   // Used for tracking if our migrations are complete. Readable for testing
   @volatile private[storage] var lastRDDMigrationTime: Long = 0
@@ -101,6 +103,7 @@ private[storage] class BlockManagerDecommissioner(
 try {
   val (shuffleBlockInfo, retryCount) = nextShuffleBlockToMigrate()
   val blocks = 
bm.migratableResolver.getMigrationBlocks(shuffleBlockInfo)
+  var isTargetDecommissioned = false
   // We only migrate a shuffle block when both index file and data 
file exist.
   if (blocks.isEmpty) {
 logInfo(s"Ignore deleted shuffle block $shuffleBlockInfo")
@@ -143,6 +146,11 @@ private[storage] class

[spark] branch master updated: [SPARK-45317][SQL][CONNECT] Handle null filename in stack traces of exceptions

2023-09-25 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8230f16164b1 [SPARK-45317][SQL][CONNECT] Handle null filename in stack 
traces of exceptions
8230f16164b1 is described below

commit 8230f16164b1cbd20ca0cb052c28c9fdb8d892d1
Author: Yihong He 
AuthorDate: Tue Sep 26 11:04:14 2023 +0900

[SPARK-45317][SQL][CONNECT] Handle null filename in stack traces of 
exceptions

### What changes were proposed in this pull request?

- Handle null filename in stack traces of exceptions
- Change the filename field in protobuf to optional

### Why are the changes needed?

- In Java exceptions, filename is the only field that can be nullable and 
null filename may cause NullPointerException

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- `build/sbt "connect-client-jvm/testOnly *ClientE2ETestSuite"`
- `build/sbt "connect-client-jvm/testOnly *ClientStreamingQuerySuite"`

### Was this patch authored or co-authored using generative AI tooling?

Closes #43103 from heyihong/SPARK-45317.

Authored-by: Yihong He 
Signed-off-by: Hyukjin Kwon 
---
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  | 28 ++
 .../src/main/protobuf/spark/connect/base.proto |  2 +-
 .../connect/client/GrpcExceptionConverter.scala|  2 +-
 .../spark/sql/connect/utils/ErrorUtils.scala   | 10 +---
 .../service/FetchErrorDetailsHandlerSuite.scala| 25 +++
 python/pyspark/sql/connect/proto/base_pb2.py   | 14 +--
 python/pyspark/sql/connect/proto/base_pb2.pyi  | 13 +-
 7 files changed, 81 insertions(+), 13 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
index ec9b1698a4ee..55718ed9c0be 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
@@ -45,6 +45,34 @@ import org.apache.spark.sql.types._
 
 class ClientE2ETestSuite extends RemoteSparkSession with SQLHelper with 
PrivateMethodTester {
 
+  test(s"throw SparkException with null filename in stack trace elements") {
+withSQLConf("spark.sql.connect.enrichError.enabled" -> "true") {
+  val session = spark
+  import session.implicits._
+
+  val throwException =
+udf((_: String) => {
+  val testError = new SparkException("test")
+  val stackTrace = testError.getStackTrace()
+  stackTrace(0) = new StackTraceElement(
+stackTrace(0).getClassName,
+stackTrace(0).getMethodName,
+null,
+stackTrace(0).getLineNumber)
+  testError.setStackTrace(stackTrace)
+  throw testError
+})
+
+  val ex = intercept[SparkException] {
+Seq("1").toDS.withColumn("udf_val", throwException($"value")).collect()
+  }
+
+  assert(ex.getCause.isInstanceOf[SparkException])
+  assert(ex.getCause.getStackTrace().length > 0)
+  assert(ex.getCause.getStackTrace()(0).getFileName == null)
+}
+  }
+
   for (enrichErrorEnabled <- Seq(false, true)) {
 test(s"cause exception - ${enrichErrorEnabled}") {
   withSQLConf("spark.sql.connect.enrichError.enabled" -> 
enrichErrorEnabled.toString) {
diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/base.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/base.proto
index e5317cae6dc8..b30c578421c2 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/base.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/base.proto
@@ -808,7 +808,7 @@ message FetchErrorDetailsResponse {
 string method_name = 2;
 
 // The name of the file containing the execution point.
-string file_name = 3;
+optional string file_name = 3;
 
 // The line number of the source line containing the execution point.
 int32 line_number = 4;
diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala
index edbc434ef964..2d86e8c1e417 100644
--- 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala
+++ 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala
@@ -222,7 +222,7 @@ private object GrpcExceptionConverter {
 new StackTraceElement(
   stackTraceElement.getDeclaringClass,

[spark] branch master updated: [SPARK-44550][SQL] Enable correctness fixes for `null IN (empty list)` under ANSI

2023-09-25 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7741fe79e20 [SPARK-44550][SQL] Enable correctness fixes for `null IN 
(empty list)` under ANSI
7741fe79e20 is described below

commit 7741fe79e201eea42ceefedd10013680736fbbea
Author: Jack Chen 
AuthorDate: Tue Sep 26 09:40:38 2023 +0800

[SPARK-44550][SQL] Enable correctness fixes for `null IN (empty list)` 
under ANSI

### What changes were proposed in this pull request?
Enables the correctness fixes for `null IN (empty list)` expressions by 
default when ANSI is enabled. Under non-ANSI the old behavior remains the 
default for now. After soaking for some time under ANSI, we should switch the 
new behavior to default in both cases.

Prior to this, `null IN (empty list)` incorrectly evaluated to null, when 
it should evaluate to false. (The reason it should be false is because a IN 
(b1, b2) is defined as a = b1 OR a = b2, and an empty IN list is treated as an 
empty OR which is false. This is specified by ANSI SQL.)

Many places in Spark execution (In, InSet, InSubquery) and optimization 
(OptimizeIn, NullPropagation) implemented this wrong behavior. This is a 
longstanding correctness issue which has existed since null support for IN 
expressions was first added to Spark.

See previous PRs where the fixes were implemented: 
https://github.com/apache/spark/pull/42007 and 
https://github.com/apache/spark/pull/42163.

See [this 
doc](https://docs.google.com/document/d/1k8AY8oyT-GI04SnP7eXttPDnDj-Ek-c3luF2zL6DPNU/edit)
 for more information.

### Why are the changes needed?
Fix wrong SQL semantics

### Does this PR introduce _any_ user-facing change?
Yes, fix wrong SQL semantics

### How was this patch tested?
Unit tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43068 from jchen5/null-in-empty-enable.

Authored-by: Jack Chen 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/predicates.scala   |  4 ++--
 .../spark/sql/catalyst/optimizer/expressions.scala| 10 +-
 .../scala/org/apache/spark/sql/internal/SQLConf.scala |  8 +++-
 .../scala/org/apache/spark/sql/EmptyInSuite.scala | 19 +++
 4 files changed, 33 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
index 31b872e04ce..419d11b13a2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
@@ -469,7 +469,7 @@ case class In(value: Expression, list: Seq[Expression]) 
extends Predicate {
 
   final override val nodePatterns: Seq[TreePattern] = Seq(IN)
   private val legacyNullInEmptyBehavior =
-SQLConf.get.getConf(SQLConf.LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR)
+SQLConf.get.legacyNullInEmptyBehavior
 
   override lazy val canonicalized: Expression = {
 val basic = withNewChildren(children.map(_.canonicalized)).asInstanceOf[In]
@@ -626,7 +626,7 @@ case class InSet(child: Expression, hset: Set[Any]) extends 
UnaryExpression with
 
   final override val nodePatterns: Seq[TreePattern] = Seq(INSET)
   private val legacyNullInEmptyBehavior =
-SQLConf.get.getConf(SQLConf.LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR)
+SQLConf.get.legacyNullInEmptyBehavior
 
   override def eval(input: InternalRow): Any = {
 if (hset.isEmpty && !legacyNullInEmptyBehavior) {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
index 8a7f54093d5..90773a1eb86 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
@@ -283,7 +283,7 @@ object OptimizeIn extends Rule[LogicalPlan] {
   case In(v, list) if list.isEmpty =>
 // IN (empty list) is always false under current behavior.
 // Under legacy behavior it's null if the left side is null, otherwise 
false (SPARK-44550).
-if (!SQLConf.get.getConf(SQLConf.LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR)) {
+if (!SQLConf.get.legacyNullInEmptyBehavior) {
   FalseLiteral
 } else {
   If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
@@ -845,20 +845,20 @@ object NullPropagation extends Rule[LogicalPlan] {
 
   // If the list is empty, transform the In expression to false literal.
   case In(_, list)
-if list.isEmpty &&

[spark] branch master updated (36e626bc60a -> c1b12bd5642)

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 36e626bc60a [SPARK-43711][SPARK-44372][CONNECT][PS][TESTS] Clear 
message for Spark ML dependent tests
 add c1b12bd5642 [SPARK-45322][CORE] Use ProcessHandle to get pid directly

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/util/Utils.scala | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch dependabot/maven/org.xerial.snappy-snappy-java-1.1.10.4 created (now 2b18d0c7daa)

2023-09-25 Thread github-bot

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.xerial.snappy-snappy-java-1.1.10.4
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 2b18d0c7daa Bump org.xerial.snappy:snappy-java from 1.1.10.3 to 
1.1.10.4

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43711][SPARK-44372][CONNECT][PS][TESTS] Clear message for Spark ML dependent tests

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 36e626bc60a [SPARK-43711][SPARK-44372][CONNECT][PS][TESTS] Clear 
message for Spark ML dependent tests
36e626bc60a is described below

commit 36e626bc60af4ce94a1ca304e05390418b965135
Author: Haejoon Lee 
AuthorDate: Mon Sep 25 11:23:51 2023 -0700

[SPARK-43711][SPARK-44372][CONNECT][PS][TESTS] Clear message for Spark ML 
dependent tests

### What changes were proposed in this pull request?

Similar to https://github.com/apache/spark/pull/42955, this PR proposes to 
correct the message for Spark ML only tests from Spark Connect.

### Why are the changes needed?

Among Spark ML dependent tests, there are some edge tests that can only be 
tested using Spark ML features. We need to be clearer about why these cannot be 
tested.

### Does this PR introduce _any_ user-facing change?

No, it's test-only.

### How was this patch tested?

Updated the existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43051 from itholic/ml_dependent.

Authored-by: Haejoon Lee 
Signed-off-by: Dongjoon Hyun 
---
 .../pyspark/pandas/tests/connect/plot/test_parity_frame_plot.py   | 4 ++--
 .../tests/connect/plot/test_parity_frame_plot_matplotlib.py   | 4 ++--
 .../pandas/tests/connect/plot/test_parity_frame_plot_plotly.py| 6 +++---
 .../tests/connect/plot/test_parity_series_plot_matplotlib.py  | 8 
 .../pandas/tests/connect/plot/test_parity_series_plot_plotly.py   | 4 ++--
 python/pyspark/pandas/tests/connect/test_parity_default_index.py  | 4 +---
 6 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot.py 
b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot.py
index 24392eaf27c..10054f58501 100644
--- a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot.py
+++ b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot.py
@@ -24,11 +24,11 @@ from pyspark.testing.pandasutils import 
PandasOnSparkTestUtils
 class DataFramePlotParityTests(
 DataFramePlotTestsMixin, PandasOnSparkTestUtils, ReusedConnectTestCase
 ):
-@unittest.skip("TODO(SPARK-43711): Fix Transformer.transform to work with 
Spark Connect.")
+@unittest.skip("Test depends on Spark ML which is not supported from Spark 
Connect.")
 def test_compute_hist_multi_columns(self):
 super().test_compute_hist_multi_columns()
 
-@unittest.skip("TODO(SPARK-43711): Fix Transformer.transform to work with 
Spark Connect.")
+@unittest.skip("Test depends on Spark ML which is not supported from Spark 
Connect.")
 def test_compute_hist_single_column(self):
 super().test_compute_hist_single_column()
 
diff --git 
a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_matplotlib.py 
b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_matplotlib.py
index 3f615326f2b..9fec1c57c02 100644
--- 
a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_matplotlib.py
+++ 
b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_matplotlib.py
@@ -24,11 +24,11 @@ from pyspark.testing.pandasutils import 
PandasOnSparkTestUtils, TestUtils
 class DataFramePlotMatplotlibParityTests(
 DataFramePlotMatplotlibTestsMixin, PandasOnSparkTestUtils, TestUtils, 
ReusedConnectTestCase
 ):
-@unittest.skip("TODO(SPARK-43711): Fix Transformer.transform to work with 
Spark Connect.")
+@unittest.skip("Test depends on Spark ML which is not supported from Spark 
Connect.")
 def test_hist_plot(self):
 super().test_hist_plot()
 
-@unittest.skip("TODO(SPARK-44372): Enable KernelDensity within Spark 
Connect.")
+@unittest.skip("Test depends on Spark ML which is not supported from Spark 
Connect.")
 def test_kde_plot(self):
 super().test_kde_plot()
 
diff --git 
a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_plotly.py 
b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_plotly.py
index 16b97d6814e..452962d8135 100644
--- a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_plotly.py
+++ b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_plotly.py
@@ -24,15 +24,15 @@ from pyspark.testing.pandasutils import 
PandasOnSparkTestUtils, TestUtils
 class DataFramePlotPlotlyParityTests(
 DataFramePlotPlotlyTestsMixin, PandasOnSparkTestUtils, TestUtils, 
ReusedConnectTestCase
 ):
-@unittest.skip("TODO(SPARK-43711): Fix Transformer.transform to work with 
Spark Connect.")
+@unittest.skip("Test depends on Spark ML which is not supported from Spark 
Connect.")
 def test_hist_layout_kwargs(self):

[spark] branch master updated (42a6557172c -> 0edeb605f2f)

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 42a6557172c [MINOR][SQL][DOCS] Remove JDK 8 related information in the 
comemnts for aes_encrypt and aes_decrypt
 add 0edeb605f2f [SPARK-45304][BUILD] Remove test classloader workaround 
for SBT build

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala | 4 
 1 file changed, 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][SQL][DOCS] Remove JDK 8 related information in the comemnts for aes_encrypt and aes_decrypt

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 42a6557172c [MINOR][SQL][DOCS] Remove JDK 8 related information in the 
comemnts for aes_encrypt and aes_decrypt
42a6557172c is described below

commit 42a6557172cd5c7cfcd049ff95a9ad1bc8fdeaa5
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 25 10:25:29 2023 -0700

[MINOR][SQL][DOCS] Remove JDK 8 related information in the comemnts for 
aes_encrypt and aes_decrypt

### What changes were proposed in this pull request?

This PR proposes to fix the comments in both `aes_encrypt` and 
`aes_decrypt`. Did a quick check for Scala/Python/R API, and seems like this is 
the only place to fix.

### Why are the changes needed?

We dropped JDK 8 at SPARK-44112

### Does this PR introduce _any_ user-facing change?

No, it's Scaladoc, and the doc is not user-facing.

### How was this patch tested?

Build in this CI should check them.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43091 from HyukjinKwon/minor-doc-AesEncrypt.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala   | 4 
 1 file changed, 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
index 92ed0843521..6a7f841c324 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
@@ -304,8 +304,6 @@ case class CurrentUser() extends LeafExpression with 
Unevaluable {
 
 /**
  * A function that encrypts input using AES. Key lengths of 128, 192 or 256 
bits can be used.
- * For versions prior to JDK 8u161, 192 and 256 bits keys can be used
- * if Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy 
Files are installed.
  * If either argument is NULL or the key length is not one of the permitted 
values,
  * the return value is NULL.
  */
@@ -388,8 +386,6 @@ case class AesEncrypt(
 
 /**
  * A function that decrypts input using AES. Key lengths of 128, 192 or 256 
bits can be used.
- * For versions prior to JDK 8u161, 192 and 256 bits keys can be used
- * if Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy 
Files are installed.
  * If either argument is NULL or the key length is not one of the permitted 
values,
  * the return value is NULL.
  */


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c1c1c9fa98f -> 772802d1165)

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c1c1c9fa98f [SPARK-45305][SQL][TESTS] Remove JDK 8 workaround added 
TreeNodeSuite
 add 772802d1165 [SPARK-45301][BUILD] Remove org.scala-lang scala-library 
added for JDK 11 workaround

No new revisions were added by this update.

Summary of changes:
 common/network-common/pom.xml | 6 --
 1 file changed, 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45305][SQL][TESTS] Remove JDK 8 workaround added TreeNodeSuite

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c1c1c9fa98f [SPARK-45305][SQL][TESTS] Remove JDK 8 workaround added 
TreeNodeSuite
c1c1c9fa98f is described below

commit c1c1c9fa98f74fcc646214d39d2fec9dad6b5cc5
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 25 10:08:32 2023 -0700

[SPARK-45305][SQL][TESTS] Remove JDK 8 workaround added TreeNodeSuite

### What changes were proposed in this pull request?

In theory, we don't need https://github.com/apache/spark/pull/29875 anymore 
because we dropped JDK 8 (according to the PR description in 
https://github.com/apache/spark/pull/29875) but `Utils.getSimpleClass` handles 
malformed class names in any event, so should be safe to keep them.

### Why are the changes needed?

To remove test that does not run. We dropped JDK 11/8 at SPARK-44112

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

CI in this PR should test them out.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43092 from HyukjinKwon/SPARK-45305.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/trees/TreeNodeSuite.scala   | 29 --
 1 file changed, 29 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
index 3411415bbb6..c2f7287758d 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
@@ -864,35 +864,6 @@ class TreeNodeSuite extends SparkFunSuite with SQLHelper {
 assert(getStateful(withNestedStatefulBefore) ne 
getStateful(withNestedStatefulAfter))
   }
 
-  object MalformedClassObject extends Serializable {
-case class MalformedNameExpression(child: Expression) extends 
TaggingExpression {
-  override protected def withNewChildInternal(newChild: Expression): 
Expression =
-copy(child = newChild)
-}
-  }
-
-  test("SPARK-32999: TreeNode.nodeName should not throw malformed class name 
error") {
-val testTriggersExpectedError = try {
-  classOf[MalformedClassObject.MalformedNameExpression].getSimpleName
-  false
-} catch {
-  case ex: java.lang.InternalError if ex.getMessage.contains("Malformed 
class name") =>
-true
-  case ex: Throwable => throw ex
-}
-// This test case only applies on older JDK versions (e.g. JDK8u), and 
doesn't trigger the
-// issue on newer JDK versions (e.g. JDK11u).
-assume(testTriggersExpectedError, "the test case didn't trigger malformed 
class name error")
-
-val expr = MalformedClassObject.MalformedNameExpression(Literal(1))
-try {
-  expr.nodeName
-} catch {
-  case ex: java.lang.InternalError if ex.getMessage.contains("Malformed 
class name") =>
-fail("TreeNode.nodeName should not throw malformed class name error")
-}
-  }
-
   test("SPARK-37800: TreeNode.argString incorrectly formats arguments of type 
Set[_]") {
 case class Node(set: Set[String], nested: Seq[Set[Int]]) extends LeafNode {
   val output: Seq[Attribute] = Nil


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45303][CORE] Remove JDK 8/11 workaround in KryoSerializerBenchmark

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4eb5577ece2 [SPARK-45303][CORE] Remove JDK 8/11 workaround in 
KryoSerializerBenchmark
4eb5577ece2 is described below

commit 4eb5577ece2449676c804e358a4a07fcc52ce670
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 25 09:58:58 2023 -0700

[SPARK-45303][CORE] Remove JDK 8/11 workaround in KryoSerializerBenchmark

### What changes were proposed in this pull request?

This PR removes the legacy workaround for JDK 8/11 in SPARK-29282.
They were already removed in SPARK-37293. This is the leftover.

### Why are the changes needed?

For consistency.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Fixed unittests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43088 from HyukjinKwon/SPARK-45303.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala| 2 +-
 .../scala/org/apache/spark/serializer/KryoSerializerBenchmark.scala   | 4 
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala 
b/core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala
index 99620fc9757..5eb22032a5e 100644
--- a/core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala
+++ b/core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala
@@ -23,7 +23,7 @@ import org.apache.spark.internal.config.Tests.IS_TESTING
 
 /**
  * A base class for generate benchmark results to a file.
- * For JDK9+, JDK major version number is added to the file names to 
distinguish the results.
+ * For JDK 21+, JDK major version number is added to the file names to 
distinguish the results.
  */
 abstract class BenchmarkBase {
   var output: Option[OutputStream] = None
diff --git 
a/core/src/test/scala/org/apache/spark/serializer/KryoSerializerBenchmark.scala 
b/core/src/test/scala/org/apache/spark/serializer/KryoSerializerBenchmark.scala
index e1e4c218e9c..97051e375cf 100644
--- 
a/core/src/test/scala/org/apache/spark/serializer/KryoSerializerBenchmark.scala
+++ 
b/core/src/test/scala/org/apache/spark/serializer/KryoSerializerBenchmark.scala
@@ -28,7 +28,6 @@ import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
 import org.apache.spark.internal.config._
 import org.apache.spark.internal.config.Kryo._
 import org.apache.spark.internal.config.Tests.IS_TESTING
-import org.apache.spark.launcher.SparkLauncher.EXECUTOR_EXTRA_JAVA_OPTIONS
 import org.apache.spark.serializer.KryoTest._
 import org.apache.spark.util.ThreadUtils
 
@@ -76,9 +75,6 @@ object KryoSerializerBenchmark extends BenchmarkBase {
 
   def createSparkContext(usePool: Boolean): SparkContext = {
 val conf = new SparkConf()
-// SPARK-29282 This is for consistency between JDK8 and JDK11.
-conf.set(EXECUTOR_EXTRA_JAVA_OPTIONS,
-  "-XX:+UseParallelGC -XX:-UseDynamicNumberOfGCThreads")
 conf.set(SERIALIZER, "org.apache.spark.serializer.KryoSerializer")
 conf.set(KRYO_USER_REGISTRATORS, Seq(classOf[MyRegistrator].getName))
 conf.set(KRYO_USE_POOL, usePool)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45313][CORE] Inline `Iterators#size` and remove `Iterators.scala`

2023-09-25 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 30c73f6e103 [SPARK-45313][CORE] Inline `Iterators#size` and remove 
`Iterators.scala`
30c73f6e103 is described below

commit 30c73f6e103851ee1f3ce012572455ab3c9d5625
Author: yangjie01 
AuthorDate: Tue Sep 26 00:40:29 2023 +0800

[SPARK-45313][CORE] Inline `Iterators#size` and remove `Iterators.scala`

### What changes were proposed in this pull request?
This pr inlined the code of `Iterators#size` and remove `Iterators.scala`.

### Why are the changes needed?
https://github.com/apache/spark/pull/37353 introduced optimizations based 
on Scala 2.13 for the `Utils.getIteratorSize` function, hence there exist 
different versions of `Iterators.scala` for Scala 2.12 and Scala 2.13.

Currently, Apache Spark 4.0 no longer supports Scala 2.12, so the 
corresponding code simplification can be performed.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43100 from LuciferYang/SPARK-45313.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../scala/org/apache/spark/util/Iterators.scala| 40 --
 .../main/scala/org/apache/spark/util/Utils.scala   | 12 ++-
 2 files changed, 11 insertions(+), 41 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/Iterators.scala 
b/core/src/main/scala/org/apache/spark/util/Iterators.scala
deleted file mode 100644
index 9756cf49b95..000
--- a/core/src/main/scala/org/apache/spark/util/Iterators.scala
+++ /dev/null
@@ -1,40 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.util
-
-private[util] object Iterators {
-
-  /**
-   * Counts the number of elements of an iterator.
-   * This method is slower than `iterator.size` when using Scala 2.13,
-   * but it can avoid overflowing problem.
-   */
-  def size(iterator: Iterator[_]): Long = {
-// SPARK-39928: For Scala 2.13, add check of `iterator.knownSize` refer to
-// `IterableOnceOps#size` to reduce the performance gap with 
`iterator.size`.
-if (iterator.knownSize > 0) iterator.knownSize.toLong
-else {
-  var count = 0L
-  while (iterator.hasNext) {
-count += 1L
-iterator.next()
-  }
-  count
-}
-  }
-}
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 149071ee1b6..b9f7eccdfe1 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -1785,7 +1785,17 @@ private[spark] object Utils
   /**
* Counts the number of elements of an iterator.
*/
-  def getIteratorSize(iterator: Iterator[_]): Long = Iterators.size(iterator)
+  def getIteratorSize(iterator: Iterator[_]): Long = {
+if (iterator.knownSize >= 0) iterator.knownSize.toLong
+else {
+  var count = 0L
+  while (iterator.hasNext) {
+count += 1L
+iterator.next()
+  }
+  count
+}
+  }
 
   /**
* Generate a zipWithIndex iterator, avoid index value overflowing problem


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45320][SQL][TESTS] Update benchmark result for `InMemoryColumnarBenchmark`

2023-09-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ff8977bf65 [SPARK-45320][SQL][TESTS] Update benchmark result for 
`InMemoryColumnarBenchmark`
3ff8977bf65 is described below

commit 3ff8977bf6584699fb4e7c24912a3a2e8405fca0
Author: yangjie01 
AuthorDate: Mon Sep 25 09:17:41 2023 -0700

[SPARK-45320][SQL][TESTS] Update benchmark result for 
`InMemoryColumnarBenchmark`

### What changes were proposed in this pull request?
This pr aims update `InMemoryColumnarBenchmark` benchmark result for Java 
17 and add benchmark result for Java 21

### Why are the changes needed?
Track the performance of Java 17/21 through micro-benchmark testing.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43105 from LuciferYang/SPARK-45320.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 ...rk-results.txt => InMemoryColumnarBenchmark-jdk21-results.txt} | 8 
 sql/core/benchmarks/InMemoryColumnarBenchmark-results.txt | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/sql/core/benchmarks/InMemoryColumnarBenchmark-results.txt 
b/sql/core/benchmarks/InMemoryColumnarBenchmark-jdk21-results.txt
similarity index 57%
copy from sql/core/benchmarks/InMemoryColumnarBenchmark-results.txt
copy to sql/core/benchmarks/InMemoryColumnarBenchmark-jdk21-results.txt
index fee34039a3d..f757ce2d707 100644
--- a/sql/core/benchmarks/InMemoryColumnarBenchmark-results.txt
+++ b/sql/core/benchmarks/InMemoryColumnarBenchmark-jdk21-results.txt
@@ -2,11 +2,11 @@
 Int In-memory with 100 rows
 

 
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 Int In-Memory scan: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
--
-columnar deserialization + columnar-to-row216235   
   27  4.6 215.9   1.0X
-row-based deserialization 179182   
3  5.6 178.8   1.2X
+columnar deserialization + columnar-to-row336413   
   67  3.0 336.2   1.0X
+row-based deserialization 215285   
   61  4.7 214.7   1.6X
 
 
diff --git a/sql/core/benchmarks/InMemoryColumnarBenchmark-results.txt 
b/sql/core/benchmarks/InMemoryColumnarBenchmark-results.txt
index fee34039a3d..e2da2eed94e 100644
--- a/sql/core/benchmarks/InMemoryColumnarBenchmark-results.txt
+++ b/sql/core/benchmarks/InMemoryColumnarBenchmark-results.txt
@@ -2,11 +2,11 @@
 Int In-memory with 100 rows
 

 
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 Int In-Memory scan: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
--
-columnar deserialization + columnar-to-row216235   
   27  4.6 215.9   1.0X
-row-based deserialization 179182   
3  5.6 178.8   1.2X
+columnar deserialization + columnar-to-row274437   
  146  3.7 273.8   1.0X
+row-based deserialization 263308   
   39  3.8 263.2   1.0X
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45307][INFRA] Use Zulu JDK in `benchmark` GitHub Action and Java 21

2023-09-25 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6e16401fbff [SPARK-45307][INFRA] Use Zulu JDK in `benchmark` GitHub 
Action and Java 21
6e16401fbff is described below

commit 6e16401fbff9b25df39daf3b1060c8e21e92c55d
Author: panbingkun 
AuthorDate: Mon Sep 25 22:39:31 2023 +0800

[SPARK-45307][INFRA] Use Zulu JDK in `benchmark` GitHub Action and Java 21

### What changes were proposed in this pull request?
The pr aims to use Zulu JDK in benchmark GitHub Action and Java 21.

### Why are the changes needed?
When I was preparing to obtain the results of 
`org.apache.spark.MapStatusesConvertBenchmark` benchmark running on JDK21 in 
GA, the following error occurred,
https://github.com/panbingkun/spark/actions/runs/6293925655/job/17085885694
https://github.com/apache/spark/assets/15246973/36e293e0-cae8-4764-a93a-93a139b6eaaa;>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually test.
https://github.com/panbingkun/spark/actions/runs/6295588793

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43094 from panbingkun/SPARK-45307.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 .github/workflows/benchmark.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
index 79afb46f301..11ebb8ae7e2 100644
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -107,7 +107,7 @@ jobs:
 if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
 uses: actions/setup-java@v3
 with:
-  distribution: temurin
+  distribution: zulu
   java-version: ${{ github.event.inputs.jdk }}
   - name: Generate TPC-DS (SF=1) table data
 if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
@@ -159,7 +159,7 @@ jobs:
 - name: Install Java ${{ github.event.inputs.jdk }}
   uses: actions/setup-java@v3
   with:
-distribution: temurin
+distribution: zulu
 java-version: ${{ github.event.inputs.jdk }}
 - name: Cache TPC-DS generated data
   if: contains(github.event.inputs.class, 'TPCDSQueryBenchmark') || 
contains(github.event.inputs.class, '*')


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45298][SPARK-31959][SQL][TESTS] Remove the workaround for JDK-8228469 in test

2023-09-25 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a975a086093 [SPARK-45298][SPARK-31959][SQL][TESTS] Remove the 
workaround for JDK-8228469 in test
a975a086093 is described below

commit a975a086093a63075cc3a2d7e944a7075e3f185e
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 25 23:22:35 2023 +0900

[SPARK-45298][SPARK-31959][SQL][TESTS] Remove the workaround for 
JDK-8228469 in test

### What changes were proposed in this pull request?

This PR removes the legacy workaround for old JDK added at SPARK-31959

### Why are the changes needed?

To remove legacy workaround. We dropped JDK 8/11 at SPARK-44112

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unittest/docs added in SPARK-31959

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43083 from HyukjinKwon/SPARK-45298.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../sql/catalyst/util/RebaseDateTimeSuite.scala| 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala
index a17ca2358de..0a44db5a699 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala
@@ -417,22 +417,12 @@ class RebaseDateTimeSuite extends SparkFunSuite with 
Matchers with SQLHelper {
 // clocks were moved backward to become Sunday, 18 November, 1945 01:00:00 
AM.
 // In this way, the overlap happened w/o Daylight Saving Time.
 val hkZid = getZoneId("Asia/Hong_Kong")
-var expected = "1945-11-18 01:30:00.0"
-var ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0)
-var earlierMicros = 
instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant)
-var laterMicros = 
instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant)
-var overlapInterval = MICROS_PER_HOUR
-if (earlierMicros + overlapInterval != laterMicros) {
-  // Old JDK might have an outdated time zone database.
-  // See https://bugs.openjdk.java.net/browse/JDK-8228469: "Hong Kong ... 
Its 1945 transition
-  // from JST to HKT was on 11-18 at 02:00, not 09-15 at 00:00"
-  expected = "1945-09-14 23:30:00.0"
-  ldt = LocalDateTime.of(1945, 9, 14, 23, 30, 0)
-  earlierMicros = 
instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant)
-  laterMicros = 
instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant)
-  // If time zone db doesn't have overlapping at all, set the overlap 
interval to zero.
-  overlapInterval = laterMicros - earlierMicros
-}
+val expected = "1945-11-18 01:30:00.0"
+val ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0)
+val earlierMicros = 
instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant)
+val laterMicros = 
instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant)
+val overlapInterval = MICROS_PER_HOUR
+assert(earlierMicros + overlapInterval == laterMicros)
 val hkTz = TimeZone.getTimeZone(hkZid)
 val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkTz, 
earlierMicros)
 val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkTz, laterMicros)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45247][PYTHON][TESTS][FOLLOW-UP] Deduplicate FrameReidexingTests.test_filter test

2023-09-25 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6e327858d3c [SPARK-45247][PYTHON][TESTS][FOLLOW-UP] Deduplicate 
FrameReidexingTests.test_filter test
6e327858d3c is described below

commit 6e327858d3ccf5cb34297147c7a9c6e54f7218a2
Author: Haejoon Lee 
AuthorDate: Mon Sep 25 22:19:50 2023 +0900

[SPARK-45247][PYTHON][TESTS][FOLLOW-UP] Deduplicate 
FrameReidexingTests.test_filter test

### What changes were proposed in this pull request?

This is followup PR for https://github.com/apache/spark/pull/43025 to 
cleanup the duplicated tests in the code.

### Why are the changes needed?

Cleanup

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No test needed / the existing CI should pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43101 from itholic/pandas_2.1.1_followup.

Authored-by: Haejoon Lee 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/tests/frame/test_reindexing.py | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/python/pyspark/pandas/tests/frame/test_reindexing.py 
b/python/pyspark/pandas/tests/frame/test_reindexing.py
index 3e40c35edd6..606efd95188 100644
--- a/python/pyspark/pandas/tests/frame/test_reindexing.py
+++ b/python/pyspark/pandas/tests/frame/test_reindexing.py
@@ -856,9 +856,6 @@ class FrameReindexingMixin:
 
 
 class FrameReidexingTests(FrameReindexingMixin, ComparisonTestBase, 
SQLTestUtils):
-def test_filter(self):
-super().test_filter()
-
 pass
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45295][CORE][SQL] Remove Utils.isMemberClass workaround for JDK 8

2023-09-25 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f94cc9f7e08 [SPARK-45295][CORE][SQL] Remove Utils.isMemberClass 
workaround for JDK 8
f94cc9f7e08 is described below

commit f94cc9f7e0858697f04486bf52f34fbaa4b0106e
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 25 22:18:32 2023 +0900

[SPARK-45295][CORE][SQL] Remove Utils.isMemberClass workaround for JDK 8

### What changes were proposed in this pull request?

This PR removes the legacy workaround for JDK 8 added at SPARK-34607

### Why are the changes needed?

To remove legacy workaround. We dropped JDK 8 at SPARK-44112

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unittest added in SPARK-34607

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43080 from HyukjinKwon/SPARK-45295.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../org/apache/spark/util/SparkClassUtils.scala| 28 --
 .../spark/sql/catalyst/encoders/OuterScopes.scala  |  2 +-
 .../sql/catalyst/expressions/objects/objects.scala |  2 +-
 3 files changed, 2 insertions(+), 30 deletions(-)

diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala
index 5984eaee42e..42d6d9fb421 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala
@@ -50,34 +50,6 @@ private[spark] trait SparkClassUtils {
   def classIsLoadable(clazz: String): Boolean = {
 Try { classForName(clazz, initialize = false) }.isSuccess
   }
-
-  /**
-   * Returns true if and only if the underlying class is a member class.
-   *
-   * Note: jdk8u throws a "Malformed class name" error if a given class is a 
deeply-nested
-   * inner class (See SPARK-34607 for details). This issue has already been 
fixed in jdk9+, so
-   * we can remove this helper method safely if we drop the support of jdk8u.
-   */
-  def isMemberClass(cls: Class[_]): Boolean = {
-try {
-  cls.isMemberClass
-} catch {
-  case _: InternalError =>
-// We emulate jdk8u `Class.isMemberClass` below:
-//   public boolean isMemberClass() {
-// return getSimpleBinaryName() != null && 
!isLocalOrAnonymousClass();
-//   }
-// `getSimpleBinaryName()` returns null if a given class is a 
top-level class,
-// so we replace it with `cls.getEnclosingClass != null`. The second 
condition checks
-// if a given class is not a local or an anonymous class, so we 
replace it with
-// `cls.getEnclosingMethod == null` because `cls.getEnclosingMethod()` 
return a value
-// only in either case (JVM Spec 4.8.6).
-//
-// Note: The newer jdk evaluates `!isLocalOrAnonymousClass()` first,
-// we reorder the conditions to follow it.
-cls.getEnclosingMethod == null && cls.getEnclosingClass != null
-}
-  }
 }
 
 private[spark] object SparkClassUtils extends SparkClassUtils
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala
index b497cd3f386..85876889569 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala
@@ -70,7 +70,7 @@ object OuterScopes {
* useful for inner class defined in REPL.
*/
   def getOuterScope(innerCls: Class[_]): () => AnyRef = {
-if (!SparkClassUtils.isMemberClass(innerCls)) {
+if (!innerCls.isMemberClass) {
   return null
 }
 val outerClass = innerCls.getDeclaringClass
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
index 32bcdaf8609..beb07259384 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
@@ -557,7 +557,7 @@ case class NewInstance(
 // Note that static inner classes (e.g., inner classes within Scala 
objects) don't need
 // outer pointer registration.
 val needOuterPointer =
-  outerPointer.isEmpty && Utils.isMemberClass(cls) && 
!Modifier.isStatic(cls.getModifiers)
+  outerPointer.isEmpty && cls.isMemberClass && 
!Modifier.isStatic(cls.getModifiers)
 childrenResolved && !needOuterPointer
   }

[spark] branch master updated: [SPARK-45297][SQL] Remove workaround for dateformatter added in SPARK-31827

2023-09-25 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e9cc76a27f6 [SPARK-45297][SQL] Remove workaround for dateformatter 
added in SPARK-31827
e9cc76a27f6 is described below

commit e9cc76a27f6f372cde1c885055cfe0bdd4fd4e7d
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 25 22:15:37 2023 +0900

[SPARK-45297][SQL] Remove workaround for dateformatter added in SPARK-31827

### What changes were proposed in this pull request?

This PR removes the legacy workaround for JDK 8 added at SPARK-31827

### Why are the changes needed?

To remove legacy workaround. We dropped JDK 8 at SPARK-44112

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unittest/docs added in SPARK-31827

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43082 from HyukjinKwon/SPARK-45297.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../spark/sql/catalyst/util/DateTimeFormatterHelper.scala | 15 ---
 1 file changed, 15 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
index 43701d1d8ff..e2a897a3211 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
@@ -262,15 +262,6 @@ private object DateTimeFormatterHelper {
 toFormatter(builder, TimestampFormatter.defaultLocale)
   }
 
-  private final val bugInStandAloneForm = {
-// Java 8 has a bug for stand-alone form. See 
https://bugs.openjdk.java.net/browse/JDK-8114833
-// Note: we only check the US locale so that it's a static check. It can 
produce false-negative
-// as some locales are not affected by the bug. Since `L`/`q` is rarely 
used, we choose to not
-// complicate the check here.
-// TODO: remove it when we drop Java 8 support.
-val formatter = DateTimeFormatter.ofPattern("LLL qqq", Locale.US)
-formatter.format(LocalDate.of(2000, 1, 1)) == "1 1"
-  }
   // SPARK-31892: The week-based date fields are rarely used and really 
confusing for parsing values
   // to datetime, especially when they are mixed with other non-week-based 
ones;
   // SPARK-31879: It's also difficult for us to restore the behavior of 
week-based date fields
@@ -328,12 +319,6 @@ private object DateTimeFormatterHelper {
   for (style <- unsupportedPatternLengths if 
patternPart.contains(style)) {
 throw new IllegalArgumentException(s"Too many pattern letters: 
${style.head}")
   }
-  if (bugInStandAloneForm && (patternPart.contains("LLL") || 
patternPart.contains("qqq"))) {
-throw new IllegalArgumentException("Java 8 has a bug to support 
stand-alone " +
-  "form (3 or more 'L' or 'q' in the pattern string). Please use 
'M' or 'Q' instead, " +
-  "or upgrade your Java version. For more details, please read " +
-  "https://bugs.openjdk.java.net/browse/JDK-8114833;)
-  }
   // In DateTimeFormatter, 'u' supports negative years. We substitute 
'y' to 'u' here for
   // keeping the support in Spark 3.0. If parse failed in Spark 3.0, 
fall back to 'y'.
   // We only do this substitution when there is no era designator 
found in the pattern.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45296][INFRA][BUILD] Comment out unused JDK 11 related in dev/run-tests.py

2023-09-25 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c326928edde [SPARK-45296][INFRA][BUILD] Comment out unused JDK 11 
related in dev/run-tests.py
c326928edde is described below

commit c326928edde319c0d8ff3ff723c7711f8596ca3f
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 25 20:27:36 2023 +0900

[SPARK-45296][INFRA][BUILD] Comment out unused JDK 11 related in 
dev/run-tests.py

### What changes were proposed in this pull request?

This PR proposes to comment unused JDK 11 related in `dev/run-tests.py`.

### Why are the changes needed?

For readability, and commenting out unused code. I added some explanation 
inlined.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

No.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43081 from HyukjinKwon/SPARK-45296.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 dev/run-tests.py | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/dev/run-tests.py b/dev/run-tests.py
index 57fe1de811d..cf0db66fba1 100755
--- a/dev/run-tests.py
+++ b/dev/run-tests.py
@@ -361,12 +361,13 @@ def run_scala_tests(build_tool, extra_profiles, 
test_modules, excluded_tags, inc
 if excluded_tags:
 test_profiles += ["-Dtest.exclude.tags=" + ",".join(excluded_tags)]
 
-# set up java11 env if this is a pull request build with 'test-java11' in 
the title
-if "ghprbPullTitle" in os.environ:
-if "test-java11" in os.environ["ghprbPullTitle"].lower():
-os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
-os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], 
os.environ["PATH"])
-test_profiles += ["-Djava.version=11"]
+# SPARK-45296: legacy code for Jenkins. If we move to Jenkins, we should
+# revive this logic with a different combination of JDK.
+# if "ghprbPullTitle" in os.environ:
+# if "test-java11" in os.environ["ghprbPullTitle"].lower():
+# os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
+# os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], 
os.environ["PATH"])
+# test_profiles += ["-Djava.version=11"]
 
 if build_tool == "maven":
 run_scala_tests_maven(test_profiles)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45306][SQL][TESTS] Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans

2023-09-25 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 3551f8b89f1 [SPARK-45306][SQL][TESTS] Make `InMemoryColumnarBenchmark` 
use AQE-aware utils to collect plans
3551f8b89f1 is described below

commit 3551f8b89f1d70a9218b8c0331bddc06c5020e95
Author: yangjie01 
AuthorDate: Mon Sep 25 19:06:02 2023 +0800

[SPARK-45306][SQL][TESTS] Make `InMemoryColumnarBenchmark` use AQE-aware 
utils to collect plans

### What changes were proposed in this pull request?
This pr makes `InMemoryColumnarBenchmark` inherit from 
AdaptiveSparkPlanHelper and use the `AdaptiveSparkPlanHelper#collect` function 
to collect plans, enabling `InMemoryColumnarBenchmark` to run successfully.

### Why are the changes needed?
After SPARK-42768 merged, the default value of 
`spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` has changed from 
false to true, so `InMemoryColumnarBenchmark ` should use AQE-aware utils to 
collect plans.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual verification.

run `build/sbt "sql/Test/runMain 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark"`

**Before**

```
[error] Exception in thread "main" java.lang.IndexOutOfBoundsException: 0
[error] at scala.collection.LinearSeqOps.apply(LinearSeq.scala:131)
[error] at scala.collection.LinearSeqOps.apply$(LinearSeq.scala:128)
[error] at scala.collection.immutable.List.apply(List.scala:79)
[error] at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.intCache(InMemoryColumnarBenchmark.scala:47)
[error] at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.$anonfun$runBenchmarkSuite$1(InMemoryColumnarBenchmark.scala:68)
[error] at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[error] at 
org.apache.spark.benchmark.BenchmarkBase.runBenchmark(BenchmarkBase.scala:42)
[error] at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.runBenchmarkSuite(InMemoryColumnarBenchmark.scala:68)
[error] at 
org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:72)
[error] at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark.main(InMemoryColumnarBenchmark.scala)
[error] Nonzero exit code returned from runner: 1
[error] (sql / Test / runMain) Nonzero exit code returned from runner: 1
```

**After**

```
[info] OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Mac OS X 13.5.2
[info] Apple M2 Max
[info] Int In-Memory scan: Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
[info] 
--
[info] columnar deserialization + columnar-to-row 95
116  34 10.5  95.4   1.0X
[info] row-based deserialization  85
 99  22 11.8  85.1   1.1X
```
### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43093 from LuciferYang/fix-InMemoryColumnarBenchmark.

Authored-by: yangjie01 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 7e9666be15b5210db00231faacd3cfa15ed71907)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
index 55d9fb27317..1f132dabd28 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
@@ -18,6 +18,7 @@ package org.apache.spark.sql.execution.columnar
 
 import org.apache.spark.benchmark.Benchmark
 import org.apache.spark.sql.execution.ColumnarToRowExec
+import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
 import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
 
 /**
@@ -33,11 +34,11 @@ import 
org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
  *  Results will be written to 
"benchmarks/InMemoryColumnarBenchmark-results.txt".
  * }}}
  */
-object InMemoryColumnarBenchmark extends SqlBasedBenchmark {
+object InMemoryColumnarBenchmark extends SqlBasedBenchmark with 
AdaptiveSparkPlanHelper {
   def

[spark] branch master updated: [SPARK-45306][SQL][TESTS] Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans

2023-09-25 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7e9666be15b [SPARK-45306][SQL][TESTS] Make `InMemoryColumnarBenchmark` 
use AQE-aware utils to collect plans
7e9666be15b is described below

commit 7e9666be15b5210db00231faacd3cfa15ed71907
Author: yangjie01 
AuthorDate: Mon Sep 25 19:06:02 2023 +0800

[SPARK-45306][SQL][TESTS] Make `InMemoryColumnarBenchmark` use AQE-aware 
utils to collect plans

### What changes were proposed in this pull request?
This pr makes `InMemoryColumnarBenchmark` inherit from 
AdaptiveSparkPlanHelper and use the `AdaptiveSparkPlanHelper#collect` function 
to collect plans, enabling `InMemoryColumnarBenchmark` to run successfully.

### Why are the changes needed?
After SPARK-42768 merged, the default value of 
`spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` has changed from 
false to true, so `InMemoryColumnarBenchmark ` should use AQE-aware utils to 
collect plans.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual verification.

run `build/sbt "sql/Test/runMain 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark"`

**Before**

```
[error] Exception in thread "main" java.lang.IndexOutOfBoundsException: 0
[error] at scala.collection.LinearSeqOps.apply(LinearSeq.scala:131)
[error] at scala.collection.LinearSeqOps.apply$(LinearSeq.scala:128)
[error] at scala.collection.immutable.List.apply(List.scala:79)
[error] at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.intCache(InMemoryColumnarBenchmark.scala:47)
[error] at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.$anonfun$runBenchmarkSuite$1(InMemoryColumnarBenchmark.scala:68)
[error] at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[error] at 
org.apache.spark.benchmark.BenchmarkBase.runBenchmark(BenchmarkBase.scala:42)
[error] at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.runBenchmarkSuite(InMemoryColumnarBenchmark.scala:68)
[error] at 
org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:72)
[error] at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark.main(InMemoryColumnarBenchmark.scala)
[error] Nonzero exit code returned from runner: 1
[error] (sql / Test / runMain) Nonzero exit code returned from runner: 1
```

**After**

```
[info] OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Mac OS X 13.5.2
[info] Apple M2 Max
[info] Int In-Memory scan: Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
[info] 
--
[info] columnar deserialization + columnar-to-row 95
116  34 10.5  95.4   1.0X
[info] row-based deserialization  85
 99  22 11.8  85.1   1.1X
```
### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43093 from LuciferYang/fix-InMemoryColumnarBenchmark.

Authored-by: yangjie01 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
index 55d9fb27317..1f132dabd28 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
@@ -18,6 +18,7 @@ package org.apache.spark.sql.execution.columnar
 
 import org.apache.spark.benchmark.Benchmark
 import org.apache.spark.sql.execution.ColumnarToRowExec
+import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
 import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
 
 /**
@@ -33,11 +34,11 @@ import 
org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
  *  Results will be written to 
"benchmarks/InMemoryColumnarBenchmark-results.txt".
  * }}}
  */
-object InMemoryColumnarBenchmark extends SqlBasedBenchmark {
+object InMemoryColumnarBenchmark extends SqlBasedBenchmark with 
AdaptiveSparkPlanHelper {
   def intCache(rowsNum: Long, numIters: Int): Unit = {
 val data = spark.range(0, rowsNum, 1, 1).toDF("i").cache()
 
-

[GitHub] [spark-website] panbingkun commented on pull request #480: Fix UI issue for `published` docs about Switch languages consistently across docs for all code snippets

2023-09-25 Thread via GitHub



panbingkun commented on PR #480:
URL: https://github.com/apache/spark-website/pull/480#issuecomment-1733085912

   cc @HyukjinKwon @allisonwang-db


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] panbingkun opened a new pull request, #480: Fix UI issue for `published` docs about Switch languages consistently across docs for all code snippets

2023-09-25 Thread via GitHub



panbingkun opened a new pull request, #480:
URL: https://github.com/apache/spark-website/pull/480

   The pr aims to fix UI issue for `published docs` about Switch languages 
consistently across docs for all code snippets
   
   https://github.com/apache/spark-website/pull/474#issuecomment-1731741985
   https://github.com/apache/spark-website/assets/15246973/50b86a6f-4bde--93bf-cb85b66e962c;>
   
   As discussed, we aim to fix the aforementioned issues by directly repairing 
files that have already been released in history.
   
   include versions:
   3.1.1, 3.1.2, 3.1.3, 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.3.0, 3.3.1, 3.3.2, 
3.3.3, 3.4.0, 3.4.1
   
   Manually test:
   ```
   bundle exec jekyll serve --watch
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45291][SQL][REST] Use unknown query execution id instead of no such app when id is invalid

2023-09-25 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 94661758c30 [SPARK-45291][SQL][REST] Use unknown query execution id 
instead of no such app when id is invalid
94661758c30 is described below

commit 94661758c3072a279a29d0c493ce419af0414d3a
Author: Kent Yao 
AuthorDate: Mon Sep 25 14:23:46 2023 +0800

[SPARK-45291][SQL][REST] Use unknown query execution id instead of no such 
app when id is invalid

### What changes were proposed in this pull request?

This PR fixes `/api/v1/applications/{appId}/sql/{executionId}` API when the 
executionId is invalid.

Before this, we get `no such app: $appId`; after this, we get `unknown 
query execution id: $executionId`

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

no, bugfix

### How was this patch tested?

new test
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43073 from yaooqinn/SPARK-45291.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 5d422155f1dae09f1631375d09e2f3c8dffba9a5)
Signed-off-by: Kent Yao 
---
 .../scala/org/apache/spark/status/api/v1/sql/SqlResource.scala   | 3 +--
 .../status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala| 9 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala 
b/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala
index 3c96f612da6..fa5bea5f9bb 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala
@@ -56,10 +56,9 @@ private[v1] class SqlResource extends BaseAppResource {
   planDescription: Boolean): ExecutionData = {
 withUI { ui =>
   val sqlStore = new SQLAppStatusStore(ui.store.store)
-  val graph = sqlStore.planGraph(execId)
   sqlStore
 .execution(execId)
-.map(prepareExecutionData(_, graph, details, planDescription))
+.map(prepareExecutionData(_, sqlStore.planGraph(execId), details, 
planDescription))
 .getOrElse(throw new NotFoundException("unknown query execution id: " 
+ execId))
 }
   }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala
index 658f79fc289..c63c748953f 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala
@@ -19,6 +19,7 @@ package org.apache.spark.status.api.v1.sql
 
 import java.net.URL
 import java.text.SimpleDateFormat
+import javax.servlet.http.HttpServletResponse
 
 import org.json4s.DefaultFormats
 import org.json4s.jackson.JsonMethods
@@ -148,4 +149,12 @@ class SqlResourceWithActualMetricsSuite
 }
   }
 
+  test("SPARK-45291: Use unknown query execution id instead of no such app 
when id is invalid") {
+val url = new URL(spark.sparkContext.ui.get.webUrl +
+  
s"/api/v1/applications/${spark.sparkContext.applicationId}/sql/${Long.MaxValue}")
+val (code, resultOpt, error) = getContentAndCode(url)
+assert(code === HttpServletResponse.SC_NOT_FOUND)
+assert(resultOpt.isEmpty)
+assert(error.get === s"unknown query execution id: ${Long.MaxValue}")
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45291][SQL][REST] Use unknown query execution id instead of no such app when id is invalid

2023-09-25 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5d422155f1d [SPARK-45291][SQL][REST] Use unknown query execution id 
instead of no such app when id is invalid
5d422155f1d is described below

commit 5d422155f1dae09f1631375d09e2f3c8dffba9a5
Author: Kent Yao 
AuthorDate: Mon Sep 25 14:23:46 2023 +0800

[SPARK-45291][SQL][REST] Use unknown query execution id instead of no such 
app when id is invalid

### What changes were proposed in this pull request?

This PR fixes `/api/v1/applications/{appId}/sql/{executionId}` API when the 
executionId is invalid.

Before this, we get `no such app: $appId`; after this, we get `unknown 
query execution id: $executionId`

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

no, bugfix

### How was this patch tested?

new test
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43073 from yaooqinn/SPARK-45291.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../scala/org/apache/spark/status/api/v1/sql/SqlResource.scala   | 3 +--
 .../status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala| 9 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala 
b/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala
index 3c96f612da6..fa5bea5f9bb 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala
@@ -56,10 +56,9 @@ private[v1] class SqlResource extends BaseAppResource {
   planDescription: Boolean): ExecutionData = {
 withUI { ui =>
   val sqlStore = new SQLAppStatusStore(ui.store.store)
-  val graph = sqlStore.planGraph(execId)
   sqlStore
 .execution(execId)
-.map(prepareExecutionData(_, graph, details, planDescription))
+.map(prepareExecutionData(_, sqlStore.planGraph(execId), details, 
planDescription))
 .getOrElse(throw new NotFoundException("unknown query execution id: " 
+ execId))
 }
   }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala
index 658f79fc289..c63c748953f 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala
@@ -19,6 +19,7 @@ package org.apache.spark.status.api.v1.sql
 
 import java.net.URL
 import java.text.SimpleDateFormat
+import javax.servlet.http.HttpServletResponse
 
 import org.json4s.DefaultFormats
 import org.json4s.jackson.JsonMethods
@@ -148,4 +149,12 @@ class SqlResourceWithActualMetricsSuite
 }
   }
 
+  test("SPARK-45291: Use unknown query execution id instead of no such app 
when id is invalid") {
+val url = new URL(spark.sparkContext.ui.get.webUrl +
+  
s"/api/v1/applications/${spark.sparkContext.applicationId}/sql/${Long.MaxValue}")
+val (code, resultOpt, error) = getContentAndCode(url)
+assert(code === HttpServletResponse.SC_NOT_FOUND)
+assert(resultOpt.isEmpty)
+assert(error.get === s"unknown query execution id: ${Long.MaxValue}")
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-25 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fb2bee37c96 [SPARK-42617][PS] Support `isocalendar` from the pandas 
2.0.0
fb2bee37c96 is described below

commit fb2bee37c964bf2164fc89a0a55085dd0c840b56
Author: zhyhimont 
AuthorDate: Mon Sep 25 15:22:32 2023 +0900

[SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

### What changes were proposed in this pull request?

Support `isocalendar` from the pandas 2.0.0

### Why are the changes needed?

When pandas 2.0.0 is released, we should match the behavior in pandas API 
on Spark.

### Does this PR introduce _any_ user-facing change?

Added new method `DatetimeIndex.isocalendar` and removed two depreceted 
`DatetimeIndex.week` and `DatetimeIndex.weekofyear`
```
dfs = ps.from_pandas(pd.date_range(start='2019-12-29', freq='D', 
periods=4).to_series())
dfs.dt.isocalendar()
year  week  day
2019-12-29  2019527
2019-12-30  2020 11
2019-12-31  2020 12
2020-01-01  2020 13
dfs.dt.isocalendar().week
2019-12-2952
2019-12-30 1
2019-12-31 1
2020-01-01 1
```

### How was this patch tested?

UT was updated

Closes #40420 from dzhigimont/SPARK-42617_ZH.

Lead-authored-by: zhyhimont 
Co-authored-by: Zhyhimont Dmitry 
Co-authored-by: Dmitry Zhyhimont 
Co-authored-by: Zhyhimont Dmitry 
Signed-off-by: Hyukjin Kwon 
---
 .../source/reference/pyspark.pandas/indexing.rst   |  3 +-
 .../source/reference/pyspark.pandas/series.rst |  3 +-
 python/pyspark/pandas/datetimes.py | 70 --
 python/pyspark/pandas/indexes/base.py  |  4 +-
 python/pyspark/pandas/indexes/datetimes.py | 49 +--
 python/pyspark/pandas/namespace.py |  3 +-
 .../pyspark/pandas/tests/indexes/test_datetime.py  | 28 ++---
 .../pandas/tests/indexes/test_datetime_property.py | 19 +-
 .../pyspark/pandas/tests/test_series_datetime.py   | 17 +-
 9 files changed, 100 insertions(+), 96 deletions(-)

diff --git a/python/docs/source/reference/pyspark.pandas/indexing.rst 
b/python/docs/source/reference/pyspark.pandas/indexing.rst
index 70d463c052a..d6be57ee9c8 100644
--- a/python/docs/source/reference/pyspark.pandas/indexing.rst
+++ b/python/docs/source/reference/pyspark.pandas/indexing.rst
@@ -338,8 +338,7 @@ Time/date components
DatetimeIndex.minute
DatetimeIndex.second
DatetimeIndex.microsecond
-   DatetimeIndex.week
-   DatetimeIndex.weekofyear
+   DatetimeIndex.isocalendar
DatetimeIndex.dayofweek
DatetimeIndex.day_of_week
DatetimeIndex.weekday
diff --git a/python/docs/source/reference/pyspark.pandas/series.rst 
b/python/docs/source/reference/pyspark.pandas/series.rst
index 552acec096f..7b658d45d4b 100644
--- a/python/docs/source/reference/pyspark.pandas/series.rst
+++ b/python/docs/source/reference/pyspark.pandas/series.rst
@@ -313,8 +313,7 @@ Datetime Properties
Series.dt.minute
Series.dt.second
Series.dt.microsecond
-   Series.dt.week
-   Series.dt.weekofyear
+   Series.dt.isocalendar
Series.dt.dayofweek
Series.dt.weekday
Series.dt.dayofyear
diff --git a/python/pyspark/pandas/datetimes.py 
b/python/pyspark/pandas/datetimes.py
index b0649cf5761..4b6e23fae7a 100644
--- a/python/pyspark/pandas/datetimes.py
+++ b/python/pyspark/pandas/datetimes.py
@@ -18,7 +18,6 @@
 """
 Date/Time related functions on pandas-on-Spark Series
 """
-import warnings
 from typing import Any, Optional, Union, no_type_check
 
 import numpy as np
@@ -27,7 +26,9 @@ from pandas.tseries.offsets import DateOffset
 
 import pyspark.pandas as ps
 import pyspark.sql.functions as F
-from pyspark.sql.types import DateType, TimestampType, TimestampNTZType, 
LongType, IntegerType
+from pyspark.sql.types import DateType, TimestampType, TimestampNTZType, 
IntegerType
+from pyspark.pandas import DataFrame
+from pyspark.pandas.config import option_context
 
 
 class DatetimeMethods:
@@ -116,26 +117,59 @@ class DatetimeMethods:
 def nanosecond(self) -> "ps.Series":
 raise NotImplementedError()
 
-# TODO(SPARK-42617): Support isocalendar.week and replace it.
-# See also https://github.com/pandas-dev/pandas/pull/33595.
-@property
-def week(self) -> "ps.Series":
+def isocalendar(self) -> "ps.DataFrame":
 """
-The week ordinal of the year.
+Calculate year, week, and day according to the ISO 8601 standard.
 
-.. deprecated:: 3.4.0
-"""
-warnings.warn(
-"weekofyear and week have been deprecated.",
-FutureWarning,
-)
-

39 matches

Mail list logo