[spark-website] branch asf-site updated: Add Jungtaek Lim to committers.md
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 00a4de3 Add Jungtaek Lim to committers.md 00a4de3 is described below commit 00a4de3e0b7e2545dd3b67c4c43bd96dc8db83a0 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Thu Jul 2 14:08:22 2020 +0900 Add Jungtaek Lim to committers.md Author: Jungtaek Lim (HeartSaVioR) Closes #277 from HeartSaVioR/add-committer-heartsavior. --- committers.md| 1 + site/committers.html | 4 2 files changed, 5 insertions(+) diff --git a/committers.md b/committers.md index 2627f36..42b89d4 100644 --- a/committers.md +++ b/committers.md @@ -50,6 +50,7 @@ navigation: |Davies Liu|Juicedata| |Cheng Lian|Databricks| |Yanbo Liang|Facebook| +|Jungtaek Lim|Cloudera| |Sean McNamara|Oracle| |Xiangrui Meng|Databricks| |Mridul Muralidharam|Google| diff --git a/site/committers.html b/site/committers.html index bb781af..5299961 100644 --- a/site/committers.html +++ b/site/committers.html @@ -371,6 +371,10 @@ Facebook + Jungtaek Lim + Cloudera + + Sean McNamara Oracle - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8b0a54e6 -> 3659611)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8b0a54e6 [SPARK-32057][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] ExecuteStatement: cancel and close should not transiently ERROR add 3659611 [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/TaskEndReason.scala | 3 ++- core/src/main/scala/org/apache/spark/util/JsonProtocol.scala | 8 ++-- core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8b0a54e6 -> 3659611)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8b0a54e6 [SPARK-32057][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] ExecuteStatement: cancel and close should not transiently ERROR add 3659611 [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/TaskEndReason.scala | 3 ++- core/src/main/scala/org/apache/spark/util/JsonProtocol.scala | 8 ++-- core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 9237fb2 [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version 9237fb2 is described below commit 9237fb2ca90d16208634d70cdff1a0ea9ddcce26 Author: Yuanjian Li AuthorDate: Wed Jul 8 09:36:06 2020 +0900 [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version ### What changes were proposed in this pull request? Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version. ### Why are the changes needed? Follow up PR for #28941. ### Does this PR introduce _any_ user-facing change? When we use the Spark version 3.0 history server reading the event log written by the old Spark version, we use the invalid value -2 to fill the map index. ### How was this patch tested? Existing UT. Closes #28965 from xuanyuanking/follow-up. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 365961155a655f19c9184b16ccd493838c848213) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- core/src/main/scala/org/apache/spark/TaskEndReason.scala | 3 ++- core/src/main/scala/org/apache/spark/util/JsonProtocol.scala | 8 ++-- core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/TaskEndReason.scala b/core/src/main/scala/org/apache/spark/TaskEndReason.scala index b13028f..6606d31 100644 --- a/core/src/main/scala/org/apache/spark/TaskEndReason.scala +++ b/core/src/main/scala/org/apache/spark/TaskEndReason.scala @@ -90,7 +90,8 @@ case class FetchFailed( extends TaskFailedReason { override def toErrorString: String = { val bmAddressString = if (bmAddress == null) "null" else bmAddress.toString -s"FetchFailed($bmAddressString, shuffleId=$shuffleId, mapIndex=$mapIndex, " + +val mapIndexString = if (mapIndex == Int.MinValue) "Unknown" else mapIndex.toString +s"FetchFailed($bmAddressString, shuffleId=$shuffleId, mapIndex=$mapIndexString, " + s"mapId=$mapId, reduceId=$reduceId, message=\n$message\n)" } diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala index 74ff5c7..78fbd0c 100644 --- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala +++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala @@ -993,8 +993,12 @@ private[spark] object JsonProtocol { val blockManagerAddress = blockManagerIdFromJson(json \ "Block Manager Address") val shuffleId = (json \ "Shuffle ID").extract[Int] val mapId = (json \ "Map ID").extract[Long] -val mapIndex = (json \ "Map Index") match { - case JNothing => 0 +val mapIndex = json \ "Map Index" match { + case JNothing => +// Note, we use the invalid value Int.MinValue here to fill the map index for backward +// compatibility. Otherwise, the fetch failed event will be dropped when the history +// server loads the event log written by the Spark version before 3.0. +Int.MinValue case x => x.extract[Int] } val reduceId = (json \ "Reduce ID").extract[Int] diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala index 98aaa9e..b77cd81 100644 --- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala @@ -312,7 +312,7 @@ class JsonProtocolSuite extends SparkFunSuite { val oldEvent = JsonProtocol.taskEndReasonToJson(fetchFailed) .removeField({ _._1 == "Map Index" }) val expectedFetchFailed = FetchFailed(BlockManagerId("With or", "without you", 15), 17, 16L, - 0, 19, "ignored") + Int.MinValue, 19, "ignored") assert(expectedFetchFailed === JsonProtocol.taskEndReasonFromJson(oldEvent)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8b0a54e6 -> 3659611)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8b0a54e6 [SPARK-32057][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] ExecuteStatement: cancel and close should not transiently ERROR add 3659611 [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/TaskEndReason.scala | 3 ++- core/src/main/scala/org/apache/spark/util/JsonProtocol.scala | 8 ++-- core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 9237fb2 [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version 9237fb2 is described below commit 9237fb2ca90d16208634d70cdff1a0ea9ddcce26 Author: Yuanjian Li AuthorDate: Wed Jul 8 09:36:06 2020 +0900 [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version ### What changes were proposed in this pull request? Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version. ### Why are the changes needed? Follow up PR for #28941. ### Does this PR introduce _any_ user-facing change? When we use the Spark version 3.0 history server reading the event log written by the old Spark version, we use the invalid value -2 to fill the map index. ### How was this patch tested? Existing UT. Closes #28965 from xuanyuanking/follow-up. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 365961155a655f19c9184b16ccd493838c848213) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- core/src/main/scala/org/apache/spark/TaskEndReason.scala | 3 ++- core/src/main/scala/org/apache/spark/util/JsonProtocol.scala | 8 ++-- core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/TaskEndReason.scala b/core/src/main/scala/org/apache/spark/TaskEndReason.scala index b13028f..6606d31 100644 --- a/core/src/main/scala/org/apache/spark/TaskEndReason.scala +++ b/core/src/main/scala/org/apache/spark/TaskEndReason.scala @@ -90,7 +90,8 @@ case class FetchFailed( extends TaskFailedReason { override def toErrorString: String = { val bmAddressString = if (bmAddress == null) "null" else bmAddress.toString -s"FetchFailed($bmAddressString, shuffleId=$shuffleId, mapIndex=$mapIndex, " + +val mapIndexString = if (mapIndex == Int.MinValue) "Unknown" else mapIndex.toString +s"FetchFailed($bmAddressString, shuffleId=$shuffleId, mapIndex=$mapIndexString, " + s"mapId=$mapId, reduceId=$reduceId, message=\n$message\n)" } diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala index 74ff5c7..78fbd0c 100644 --- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala +++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala @@ -993,8 +993,12 @@ private[spark] object JsonProtocol { val blockManagerAddress = blockManagerIdFromJson(json \ "Block Manager Address") val shuffleId = (json \ "Shuffle ID").extract[Int] val mapId = (json \ "Map ID").extract[Long] -val mapIndex = (json \ "Map Index") match { - case JNothing => 0 +val mapIndex = json \ "Map Index" match { + case JNothing => +// Note, we use the invalid value Int.MinValue here to fill the map index for backward +// compatibility. Otherwise, the fetch failed event will be dropped when the history +// server loads the event log written by the Spark version before 3.0. +Int.MinValue case x => x.extract[Int] } val reduceId = (json \ "Reduce ID").extract[Int] diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala index 98aaa9e..b77cd81 100644 --- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala @@ -312,7 +312,7 @@ class JsonProtocolSuite extends SparkFunSuite { val oldEvent = JsonProtocol.taskEndReasonToJson(fetchFailed) .removeField({ _._1 == "Map Index" }) val expectedFetchFailed = FetchFailed(BlockManagerId("With or", "without you", 15), 17, 16L, - 0, 19, "ignored") + Int.MinValue, 19, "ignored") assert(expectedFetchFailed === JsonProtocol.taskEndReasonFromJson(oldEvent)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 9237fb2 [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version 9237fb2 is described below commit 9237fb2ca90d16208634d70cdff1a0ea9ddcce26 Author: Yuanjian Li AuthorDate: Wed Jul 8 09:36:06 2020 +0900 [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version ### What changes were proposed in this pull request? Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version. ### Why are the changes needed? Follow up PR for #28941. ### Does this PR introduce _any_ user-facing change? When we use the Spark version 3.0 history server reading the event log written by the old Spark version, we use the invalid value -2 to fill the map index. ### How was this patch tested? Existing UT. Closes #28965 from xuanyuanking/follow-up. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 365961155a655f19c9184b16ccd493838c848213) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- core/src/main/scala/org/apache/spark/TaskEndReason.scala | 3 ++- core/src/main/scala/org/apache/spark/util/JsonProtocol.scala | 8 ++-- core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/TaskEndReason.scala b/core/src/main/scala/org/apache/spark/TaskEndReason.scala index b13028f..6606d31 100644 --- a/core/src/main/scala/org/apache/spark/TaskEndReason.scala +++ b/core/src/main/scala/org/apache/spark/TaskEndReason.scala @@ -90,7 +90,8 @@ case class FetchFailed( extends TaskFailedReason { override def toErrorString: String = { val bmAddressString = if (bmAddress == null) "null" else bmAddress.toString -s"FetchFailed($bmAddressString, shuffleId=$shuffleId, mapIndex=$mapIndex, " + +val mapIndexString = if (mapIndex == Int.MinValue) "Unknown" else mapIndex.toString +s"FetchFailed($bmAddressString, shuffleId=$shuffleId, mapIndex=$mapIndexString, " + s"mapId=$mapId, reduceId=$reduceId, message=\n$message\n)" } diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala index 74ff5c7..78fbd0c 100644 --- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala +++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala @@ -993,8 +993,12 @@ private[spark] object JsonProtocol { val blockManagerAddress = blockManagerIdFromJson(json \ "Block Manager Address") val shuffleId = (json \ "Shuffle ID").extract[Int] val mapId = (json \ "Map ID").extract[Long] -val mapIndex = (json \ "Map Index") match { - case JNothing => 0 +val mapIndex = json \ "Map Index" match { + case JNothing => +// Note, we use the invalid value Int.MinValue here to fill the map index for backward +// compatibility. Otherwise, the fetch failed event will be dropped when the history +// server loads the event log written by the Spark version before 3.0. +Int.MinValue case x => x.extract[Int] } val reduceId = (json \ "Reduce ID").extract[Int] diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala index 98aaa9e..b77cd81 100644 --- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala @@ -312,7 +312,7 @@ class JsonProtocolSuite extends SparkFunSuite { val oldEvent = JsonProtocol.taskEndReasonToJson(fetchFailed) .removeField({ _._1 == "Map Index" }) val expectedFetchFailed = FetchFailed(BlockManagerId("With or", "without you", 15), 17, 16L, - 0, 19, "ignored") + Int.MinValue, 19, "ignored") assert(expectedFetchFailed === JsonProtocol.taskEndReasonFromJson(oldEvent)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (371b35d -> 8e7fc04)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 371b35d [SPARK-32214][SQL] The type conversion function generated in makeFromJava for "other" type uses a wrong variable add 8e7fc04 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing No new revisions were added by this update. Summary of changes: .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (371b35d -> 8e7fc04)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 371b35d [SPARK-32214][SQL] The type conversion function generated in makeFromJava for "other" type uses a wrong variable add 8e7fc04 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing No new revisions were added by this update. Summary of changes: .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ac2c6cd3 is described below commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index 0a1f333..4ddd6d9 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -75,14 +75,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index f78469e..b17880a 100644 --- a/core/src/test/scala/org/
[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new eddf40d [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing eddf40d is described below commit eddf40d5e48ada258dff45661816073cb39eb721 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index ad0dd23..dc17140 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -76,14 +76,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index 4b1b921..a050519 100644 --- a/core/src/test/scala/org/
[spark] branch master updated (371b35d -> 8e7fc04)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 371b35d [SPARK-32214][SQL] The type conversion function generated in makeFromJava for "other" type uses a wrong variable add 8e7fc04 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing No new revisions were added by this update. Summary of changes: .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ac2c6cd3 is described below commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index 0a1f333..4ddd6d9 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -75,14 +75,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index f78469e..b17880a 100644 --- a/core/src/test/scala/org/
[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new eddf40d [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing eddf40d is described below commit eddf40d5e48ada258dff45661816073cb39eb721 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index ad0dd23..dc17140 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -76,14 +76,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index 4b1b921..a050519 100644 --- a/core/src/test/scala/org/
[spark] branch master updated (371b35d -> 8e7fc04)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 371b35d [SPARK-32214][SQL] The type conversion function generated in makeFromJava for "other" type uses a wrong variable add 8e7fc04 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing No new revisions were added by this update. Summary of changes: .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ac2c6cd3 is described below commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index 0a1f333..4ddd6d9 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -75,14 +75,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index f78469e..b17880a 100644 --- a/core/src/test/scala/org/
[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new eddf40d [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing eddf40d is described below commit eddf40d5e48ada258dff45661816073cb39eb721 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index ad0dd23..dc17140 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -76,14 +76,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index 4b1b921..a050519 100644 --- a/core/src/test/scala/org/
[spark] branch master updated (371b35d -> 8e7fc04)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 371b35d [SPARK-32214][SQL] The type conversion function generated in makeFromJava for "other" type uses a wrong variable add 8e7fc04 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing No new revisions were added by this update. Summary of changes: .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ac2c6cd3 is described below commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index 0a1f333..4ddd6d9 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -75,14 +75,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index f78469e..b17880a 100644 --- a/core/src/test/scala/org/
[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new eddf40d [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing eddf40d is described below commit eddf40d5e48ada258dff45661816073cb39eb721 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index ad0dd23..dc17140 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -76,14 +76,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index 4b1b921..a050519 100644 --- a/core/src/test/scala/org/
[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ac2c6cd3 is described below commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index 0a1f333..4ddd6d9 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -75,14 +75,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index f78469e..b17880a 100644 --- a/core/src/test/scala/org/
[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new eddf40d [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing eddf40d is described below commit eddf40d5e48ada258dff45661816073cb39eb721 Author: Zhen Li AuthorDate: Wed Jul 8 21:58:45 2020 +0900 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing ### What changes were proposed in this pull request? Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing. ### Why are the changes needed? This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.  **Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes. This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to [...] **Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages. I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`. **Benefit**: this change would help improve history server reliability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test and manually tested it. Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize. Authored-by: Zhen Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../deploy/history/HistoryServerDiskManager.scala | 21 -- .../history/HistoryServerDiskManagerSuite.scala| 46 ++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index ad0dd23..dc17140 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -76,14 +76,29 @@ private class HistoryServerDiskManager( // Go through the recorded store directories and remove any that may have been removed by // external code. -val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { info => - !new File(info.path).exists() -}.toSeq +val (existences, orphans) = listing + .view(classOf[ApplicationStoreInfo]) + .asScala + .toSeq + .partition { info => +new File(info.path).exists() + } orphans.foreach { info => listing.delete(info.getClass(), info.path) } +// Reading level db would trigger table file compaction, then it may cause size of level db +// directory changed. When service restarts, "currentUsage" is calculated from real directory +// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals +// sum of "ApplicationStoreInfo.size". +existences.foreach { info => + val fileSize = sizeOf(new File(info.path)) + if (fileSize != info.size) { +listing.write(info.copy(size = fileSize)) + } +} + logInfo("Initialized disk manager: " + s"current usage = ${Utils.bytesToString(currentUsage.get())}, " + s"max usage = ${Utils.bytesToString(maxUsage)}") diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index 4b1b921..a050519 100644 --- a/core/src/test/scala/org/
[spark] branch master updated (4609f1f -> e6e43cb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4609f1f [SPARK-32207][SQL] Support 'F'-suffixed Float Literals add e6e43cb [SPARK-32242][SQL] CliSuite flakiness fix via differentiating cli driver bootup timeout and query execution timeout No new revisions were added by this update. Summary of changes: .../spark/sql/hive/thriftserver/CliSuite.scala | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4609f1f -> e6e43cb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4609f1f [SPARK-32207][SQL] Support 'F'-suffixed Float Literals add e6e43cb [SPARK-32242][SQL] CliSuite flakiness fix via differentiating cli driver bootup timeout and query execution timeout No new revisions were added by this update. Summary of changes: .../spark/sql/hive/thriftserver/CliSuite.scala | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4609f1f -> e6e43cb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4609f1f [SPARK-32207][SQL] Support 'F'-suffixed Float Literals add e6e43cb [SPARK-32242][SQL] CliSuite flakiness fix via differentiating cli driver bootup timeout and query execution timeout No new revisions were added by this update. Summary of changes: .../spark/sql/hive/thriftserver/CliSuite.scala | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4609f1f -> e6e43cb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4609f1f [SPARK-32207][SQL] Support 'F'-suffixed Float Literals add e6e43cb [SPARK-32242][SQL] CliSuite flakiness fix via differentiating cli driver bootup timeout and query execution timeout No new revisions were added by this update. Summary of changes: .../spark/sql/hive/thriftserver/CliSuite.scala | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0c9196e -> 1b3fc9a)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0c9196e [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF add 1b3fc9a [SPARK-32149][SHUFFLE] Improve file path name normalisation at block resolution within the external shuffle service No new revisions were added by this update. Summary of changes: .../spark/network/shuffle/ExecutorDiskUtils.java | 52 +- .../shuffle/ExternalShuffleBlockResolver.java | 3 -- .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 --- 3 files changed, 10 insertions(+), 72 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0c9196e -> 1b3fc9a)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0c9196e [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF add 1b3fc9a [SPARK-32149][SHUFFLE] Improve file path name normalisation at block resolution within the external shuffle service No new revisions were added by this update. Summary of changes: .../spark/network/shuffle/ExecutorDiskUtils.java | 52 +- .../shuffle/ExternalShuffleBlockResolver.java | 3 -- .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 --- 3 files changed, 10 insertions(+), 72 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0c9196e -> 1b3fc9a)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0c9196e [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF add 1b3fc9a [SPARK-32149][SHUFFLE] Improve file path name normalisation at block resolution within the external shuffle service No new revisions were added by this update. Summary of changes: .../spark/network/shuffle/ExecutorDiskUtils.java | 52 +- .../shuffle/ExternalShuffleBlockResolver.java | 3 -- .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 --- 3 files changed, 10 insertions(+), 72 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0c9196e -> 1b3fc9a)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0c9196e [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF add 1b3fc9a [SPARK-32149][SHUFFLE] Improve file path name normalisation at block resolution within the external shuffle service No new revisions were added by this update. Summary of changes: .../spark/network/shuffle/ExecutorDiskUtils.java | 52 +- .../shuffle/ExternalShuffleBlockResolver.java | 3 -- .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 --- 3 files changed, 10 insertions(+), 72 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0c9196e -> 1b3fc9a)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0c9196e [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF add 1b3fc9a [SPARK-32149][SHUFFLE] Improve file path name normalisation at block resolution within the external shuffle service No new revisions were added by this update. Summary of changes: .../spark/network/shuffle/ExecutorDiskUtils.java | 52 +- .../shuffle/ExternalShuffleBlockResolver.java | 3 -- .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 --- 3 files changed, 10 insertions(+), 72 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c4b0639 -> ad90cbf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c4b0639 [SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a different encoding add ad90cbf [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite No new revisions were added by this update. Summary of changes: .../hive/thriftserver/HiveSessionImplSuite.scala | 86 -- 1 file changed, 63 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c4b0639 -> ad90cbf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c4b0639 [SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a different encoding add ad90cbf [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite No new revisions were added by this update. Summary of changes: .../hive/thriftserver/HiveSessionImplSuite.scala | 86 -- 1 file changed, 63 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c4b0639 -> ad90cbf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c4b0639 [SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a different encoding add ad90cbf [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite No new revisions were added by this update. Summary of changes: .../hive/thriftserver/HiveSessionImplSuite.scala | 86 -- 1 file changed, 63 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c4b0639 -> ad90cbf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c4b0639 [SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a different encoding add ad90cbf [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite No new revisions were added by this update. Summary of changes: .../hive/thriftserver/HiveSessionImplSuite.scala | 86 -- 1 file changed, 63 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c602d79 -> 90b0c26)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c602d79 [SPARK-32311][PYSPARK][TESTS] Remove duplicate import add 90b0c26 [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 99 +++ .../history/HistoryServerMemoryManager.scala | 85 ++ .../apache/spark/deploy/history/HybridStore.scala | 185 + .../org/apache/spark/internal/config/History.scala | 16 ++ docs/monitoring.md | 19 +++ 5 files changed, 404 insertions(+) create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c602d79 -> 90b0c26)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c602d79 [SPARK-32311][PYSPARK][TESTS] Remove duplicate import add 90b0c26 [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 99 +++ .../history/HistoryServerMemoryManager.scala | 85 ++ .../apache/spark/deploy/history/HybridStore.scala | 185 + .../org/apache/spark/internal/config/History.scala | 16 ++ docs/monitoring.md | 19 +++ 5 files changed, 404 insertions(+) create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c602d79 -> 90b0c26)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c602d79 [SPARK-32311][PYSPARK][TESTS] Remove duplicate import add 90b0c26 [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 99 +++ .../history/HistoryServerMemoryManager.scala | 85 ++ .../apache/spark/deploy/history/HybridStore.scala | 185 + .../org/apache/spark/internal/config/History.scala | 16 ++ docs/monitoring.md | 19 +++ 5 files changed, 404 insertions(+) create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c602d79 -> 90b0c26)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c602d79 [SPARK-32311][PYSPARK][TESTS] Remove duplicate import add 90b0c26 [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 99 +++ .../history/HistoryServerMemoryManager.scala | 85 ++ .../apache/spark/deploy/history/HybridStore.scala | 185 + .../org/apache/spark/internal/config/History.scala | 16 ++ docs/monitoring.md | 19 +++ 5 files changed, 404 insertions(+) create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c602d79 -> 90b0c26)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c602d79 [SPARK-32311][PYSPARK][TESTS] Remove duplicate import add 90b0c26 [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 99 +++ .../history/HistoryServerMemoryManager.scala | 85 ++ .../apache/spark/deploy/history/HybridStore.scala | 185 + .../org/apache/spark/internal/config/History.scala | 16 ++ docs/monitoring.md | 19 +++ 5 files changed, 404 insertions(+) create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (af8e65f -> 542aefb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from af8e65f [SPARK-32276][SQL] Remove redundant sorts before repartition nodes add 542aefb [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 11 - .../datasources/v2/DataSourceV2Strategy.scala | 14 +- .../continuous/ContinuousCoalesceExec.scala| 45 --- .../continuous/ContinuousCoalesceRDD.scala | 137 --- .../streaming/continuous/ContinuousExecution.scala | 4 - .../shuffle/ContinuousShuffleReadRDD.scala | 80 .../shuffle/ContinuousShuffleReader.scala | 32 -- .../shuffle/ContinuousShuffleWriter.scala | 27 -- .../shuffle/RPCContinuousShuffleReader.scala | 138 --- .../shuffle/RPCContinuousShuffleWriter.scala | 60 --- .../execution/streaming/state/StateStoreRDD.scala | 14 +- .../shuffle/ContinuousShuffleSuite.scala | 423 - .../continuous/ContinuousAggregationSuite.scala| 134 --- 13 files changed, 2 insertions(+), 1117 deletions(-) delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (af8e65f -> 542aefb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from af8e65f [SPARK-32276][SQL] Remove redundant sorts before repartition nodes add 542aefb [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 11 - .../datasources/v2/DataSourceV2Strategy.scala | 14 +- .../continuous/ContinuousCoalesceExec.scala| 45 --- .../continuous/ContinuousCoalesceRDD.scala | 137 --- .../streaming/continuous/ContinuousExecution.scala | 4 - .../shuffle/ContinuousShuffleReadRDD.scala | 80 .../shuffle/ContinuousShuffleReader.scala | 32 -- .../shuffle/ContinuousShuffleWriter.scala | 27 -- .../shuffle/RPCContinuousShuffleReader.scala | 138 --- .../shuffle/RPCContinuousShuffleWriter.scala | 60 --- .../execution/streaming/state/StateStoreRDD.scala | 14 +- .../shuffle/ContinuousShuffleSuite.scala | 423 - .../continuous/ContinuousAggregationSuite.scala| 134 --- 13 files changed, 2 insertions(+), 1117 deletions(-) delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (af8e65f -> 542aefb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from af8e65f [SPARK-32276][SQL] Remove redundant sorts before repartition nodes add 542aefb [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 11 - .../datasources/v2/DataSourceV2Strategy.scala | 14 +- .../continuous/ContinuousCoalesceExec.scala| 45 --- .../continuous/ContinuousCoalesceRDD.scala | 137 --- .../streaming/continuous/ContinuousExecution.scala | 4 - .../shuffle/ContinuousShuffleReadRDD.scala | 80 .../shuffle/ContinuousShuffleReader.scala | 32 -- .../shuffle/ContinuousShuffleWriter.scala | 27 -- .../shuffle/RPCContinuousShuffleReader.scala | 138 --- .../shuffle/RPCContinuousShuffleWriter.scala | 60 --- .../execution/streaming/state/StateStoreRDD.scala | 14 +- .../shuffle/ContinuousShuffleSuite.scala | 423 - .../continuous/ContinuousAggregationSuite.scala| 134 --- 13 files changed, 2 insertions(+), 1117 deletions(-) delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (af8e65f -> 542aefb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from af8e65f [SPARK-32276][SQL] Remove redundant sorts before repartition nodes add 542aefb [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 11 - .../datasources/v2/DataSourceV2Strategy.scala | 14 +- .../continuous/ContinuousCoalesceExec.scala| 45 --- .../continuous/ContinuousCoalesceRDD.scala | 137 --- .../streaming/continuous/ContinuousExecution.scala | 4 - .../shuffle/ContinuousShuffleReadRDD.scala | 80 .../shuffle/ContinuousShuffleReader.scala | 32 -- .../shuffle/ContinuousShuffleWriter.scala | 27 -- .../shuffle/RPCContinuousShuffleReader.scala | 138 --- .../shuffle/RPCContinuousShuffleWriter.scala | 60 --- .../execution/streaming/state/StateStoreRDD.scala | 14 +- .../shuffle/ContinuousShuffleSuite.scala | 423 - .../continuous/ContinuousAggregationSuite.scala| 134 --- 13 files changed, 2 insertions(+), 1117 deletions(-) delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (af8e65f -> 542aefb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from af8e65f [SPARK-32276][SQL] Remove redundant sorts before repartition nodes add 542aefb [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 11 - .../datasources/v2/DataSourceV2Strategy.scala | 14 +- .../continuous/ContinuousCoalesceExec.scala| 45 --- .../continuous/ContinuousCoalesceRDD.scala | 137 --- .../streaming/continuous/ContinuousExecution.scala | 4 - .../shuffle/ContinuousShuffleReadRDD.scala | 80 .../shuffle/ContinuousShuffleReader.scala | 32 -- .../shuffle/ContinuousShuffleWriter.scala | 27 -- .../shuffle/RPCContinuousShuffleReader.scala | 138 --- .../shuffle/RPCContinuousShuffleWriter.scala | 60 --- .../execution/streaming/state/StateStoreRDD.scala | 14 +- .../shuffle/ContinuousShuffleSuite.scala | 423 - .../continuous/ContinuousAggregationSuite.scala| 134 --- 13 files changed, 2 insertions(+), 1117 deletions(-) delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fb51925 -> 9747e8f)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb51925 [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT add 9747e8f [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for HiveSessionImplSuite in hive version related subdirectories No new revisions were added by this update. Summary of changes: sql/hive-thriftserver/pom.xml | 12 ++ .../hive/thriftserver/HiveSessionImplSuite.scala | 29 + .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ 4 files changed, 113 insertions(+), 28 deletions(-) create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fb51925 -> 9747e8f)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb51925 [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT add 9747e8f [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for HiveSessionImplSuite in hive version related subdirectories No new revisions were added by this update. Summary of changes: sql/hive-thriftserver/pom.xml | 12 ++ .../hive/thriftserver/HiveSessionImplSuite.scala | 29 + .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ 4 files changed, 113 insertions(+), 28 deletions(-) create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fb51925 -> 9747e8f)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb51925 [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT add 9747e8f [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for HiveSessionImplSuite in hive version related subdirectories No new revisions were added by this update. Summary of changes: sql/hive-thriftserver/pom.xml | 12 ++ .../hive/thriftserver/HiveSessionImplSuite.scala | 29 + .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ 4 files changed, 113 insertions(+), 28 deletions(-) create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fb51925 -> 9747e8f)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb51925 [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT add 9747e8f [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for HiveSessionImplSuite in hive version related subdirectories No new revisions were added by this update. Summary of changes: sql/hive-thriftserver/pom.xml | 12 ++ .../hive/thriftserver/HiveSessionImplSuite.scala | 29 + .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ 4 files changed, 113 insertions(+), 28 deletions(-) create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch branch-3.0 updated (0bff1f6 -> 02f80cf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 0bff1f6 [SPARK-33123][INFRA] Ignore GitHub only changes in Amplab Jenkins build new 15ed312 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server new 02f80cf Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 7 +++- .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 55 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org