from:"kabhwan"

[spark-website] branch asf-site updated: Add Jungtaek Lim to committers.md

2020-07-01 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 00a4de3  Add Jungtaek Lim to committers.md
00a4de3 is described below

commit 00a4de3e0b7e2545dd3b67c4c43bd96dc8db83a0
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Thu Jul 2 14:08:22 2020 +0900

Add Jungtaek Lim to committers.md

Author: Jungtaek Lim (HeartSaVioR) 

Closes #277 from HeartSaVioR/add-committer-heartsavior.
---
 committers.md| 1 +
 site/committers.html | 4 
 2 files changed, 5 insertions(+)

diff --git a/committers.md b/committers.md
index 2627f36..42b89d4 100644
--- a/committers.md
+++ b/committers.md
@@ -50,6 +50,7 @@ navigation:
 |Davies Liu|Juicedata|
 |Cheng Lian|Databricks|
 |Yanbo Liang|Facebook|
+|Jungtaek Lim|Cloudera|
 |Sean McNamara|Oracle|
 |Xiangrui Meng|Databricks|
 |Mridul Muralidharam|Google|
diff --git a/site/committers.html b/site/committers.html
index bb781af..5299961 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -371,6 +371,10 @@
   Facebook
 
 
+  Jungtaek Lim
+  Cloudera
+
+
   Sean McNamara
   Oracle
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8b0a54e6 -> 3659611)

2020-07-07 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8b0a54e6 [SPARK-32057][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] 
ExecuteStatement: cancel and close should not transiently ERROR
 add 3659611  [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value 
Int.MinValue to fill the map index when the event logs from the old Spark 
version

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/TaskEndReason.scala  | 3 ++-
 core/src/main/scala/org/apache/spark/util/JsonProtocol.scala  | 8 ++--
 core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +-
 3 files changed, 9 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8b0a54e6 -> 3659611)

2020-07-07 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8b0a54e6 [SPARK-32057][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] 
ExecuteStatement: cancel and close should not transiently ERROR
 add 3659611  [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value 
Int.MinValue to fill the map index when the event logs from the old Spark 
version

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/TaskEndReason.scala  | 3 ++-
 core/src/main/scala/org/apache/spark/util/JsonProtocol.scala  | 8 ++--
 core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +-
 3 files changed, 9 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version

2020-07-07 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9237fb2  [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value 
Int.MinValue to fill the map index when the event logs from the old Spark 
version
9237fb2 is described below

commit 9237fb2ca90d16208634d70cdff1a0ea9ddcce26
Author: Yuanjian Li 
AuthorDate: Wed Jul 8 09:36:06 2020 +0900

[SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill 
the map index when the event logs from the old Spark version

### What changes were proposed in this pull request?
Use the invalid value Int.MinValue to fill the map index when the event 
logs from the old Spark version.

### Why are the changes needed?
Follow up PR for #28941.

### Does this PR introduce _any_ user-facing change?
When we use the Spark version 3.0 history server reading the event log 
written by the old Spark version, we use the invalid value -2 to fill the map 
index.

### How was this patch tested?
Existing UT.

Closes #28965 from xuanyuanking/follow-up.

Authored-by: Yuanjian Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 365961155a655f19c9184b16ccd493838c848213)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 core/src/main/scala/org/apache/spark/TaskEndReason.scala  | 3 ++-
 core/src/main/scala/org/apache/spark/util/JsonProtocol.scala  | 8 ++--
 core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/TaskEndReason.scala 
b/core/src/main/scala/org/apache/spark/TaskEndReason.scala
index b13028f..6606d31 100644
--- a/core/src/main/scala/org/apache/spark/TaskEndReason.scala
+++ b/core/src/main/scala/org/apache/spark/TaskEndReason.scala
@@ -90,7 +90,8 @@ case class FetchFailed(
   extends TaskFailedReason {
   override def toErrorString: String = {
 val bmAddressString = if (bmAddress == null) "null" else bmAddress.toString
-s"FetchFailed($bmAddressString, shuffleId=$shuffleId, mapIndex=$mapIndex, 
" +
+val mapIndexString = if (mapIndex == Int.MinValue) "Unknown" else 
mapIndex.toString
+s"FetchFailed($bmAddressString, shuffleId=$shuffleId, 
mapIndex=$mapIndexString, " +
   s"mapId=$mapId, reduceId=$reduceId, message=\n$message\n)"
   }
 
diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index 74ff5c7..78fbd0c 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -993,8 +993,12 @@ private[spark] object JsonProtocol {
 val blockManagerAddress = blockManagerIdFromJson(json \ "Block Manager 
Address")
 val shuffleId = (json \ "Shuffle ID").extract[Int]
 val mapId = (json \ "Map ID").extract[Long]
-val mapIndex = (json \ "Map Index") match {
-  case JNothing => 0
+val mapIndex = json \ "Map Index" match {
+  case JNothing =>
+// Note, we use the invalid value Int.MinValue here to fill the 
map index for backward
+// compatibility. Otherwise, the fetch failed event will be 
dropped when the history
+// server loads the event log written by the Spark version before 
3.0.
+Int.MinValue
   case x => x.extract[Int]
 }
 val reduceId = (json \ "Reduce ID").extract[Int]
diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
index 98aaa9e..b77cd81 100644
--- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
@@ -312,7 +312,7 @@ class JsonProtocolSuite extends SparkFunSuite {
 val oldEvent = JsonProtocol.taskEndReasonToJson(fetchFailed)
   .removeField({ _._1 == "Map Index" })
 val expectedFetchFailed = FetchFailed(BlockManagerId("With or", "without 
you", 15), 17, 16L,
-  0, 19, "ignored")
+  Int.MinValue, 19, "ignored")
 assert(expectedFetchFailed === 
JsonProtocol.taskEndReasonFromJson(oldEvent))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8b0a54e6 -> 3659611)

2020-07-07 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8b0a54e6 [SPARK-32057][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] 
ExecuteStatement: cancel and close should not transiently ERROR
 add 3659611  [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value 
Int.MinValue to fill the map index when the event logs from the old Spark 
version

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/TaskEndReason.scala  | 3 ++-
 core/src/main/scala/org/apache/spark/util/JsonProtocol.scala  | 8 ++--
 core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +-
 3 files changed, 9 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version

2020-07-07 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9237fb2  [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value 
Int.MinValue to fill the map index when the event logs from the old Spark 
version
9237fb2 is described below

commit 9237fb2ca90d16208634d70cdff1a0ea9ddcce26
Author: Yuanjian Li 
AuthorDate: Wed Jul 8 09:36:06 2020 +0900

[SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill 
the map index when the event logs from the old Spark version

### What changes were proposed in this pull request?
Use the invalid value Int.MinValue to fill the map index when the event 
logs from the old Spark version.

### Why are the changes needed?
Follow up PR for #28941.

### Does this PR introduce _any_ user-facing change?
When we use the Spark version 3.0 history server reading the event log 
written by the old Spark version, we use the invalid value -2 to fill the map 
index.

### How was this patch tested?
Existing UT.

Closes #28965 from xuanyuanking/follow-up.

Authored-by: Yuanjian Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 365961155a655f19c9184b16ccd493838c848213)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 core/src/main/scala/org/apache/spark/TaskEndReason.scala  | 3 ++-
 core/src/main/scala/org/apache/spark/util/JsonProtocol.scala  | 8 ++--
 core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/TaskEndReason.scala 
b/core/src/main/scala/org/apache/spark/TaskEndReason.scala
index b13028f..6606d31 100644
--- a/core/src/main/scala/org/apache/spark/TaskEndReason.scala
+++ b/core/src/main/scala/org/apache/spark/TaskEndReason.scala
@@ -90,7 +90,8 @@ case class FetchFailed(
   extends TaskFailedReason {
   override def toErrorString: String = {
 val bmAddressString = if (bmAddress == null) "null" else bmAddress.toString
-s"FetchFailed($bmAddressString, shuffleId=$shuffleId, mapIndex=$mapIndex, 
" +
+val mapIndexString = if (mapIndex == Int.MinValue) "Unknown" else 
mapIndex.toString
+s"FetchFailed($bmAddressString, shuffleId=$shuffleId, 
mapIndex=$mapIndexString, " +
   s"mapId=$mapId, reduceId=$reduceId, message=\n$message\n)"
   }
 
diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index 74ff5c7..78fbd0c 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -993,8 +993,12 @@ private[spark] object JsonProtocol {
 val blockManagerAddress = blockManagerIdFromJson(json \ "Block Manager 
Address")
 val shuffleId = (json \ "Shuffle ID").extract[Int]
 val mapId = (json \ "Map ID").extract[Long]
-val mapIndex = (json \ "Map Index") match {
-  case JNothing => 0
+val mapIndex = json \ "Map Index" match {
+  case JNothing =>
+// Note, we use the invalid value Int.MinValue here to fill the 
map index for backward
+// compatibility. Otherwise, the fetch failed event will be 
dropped when the history
+// server loads the event log written by the Spark version before 
3.0.
+Int.MinValue
   case x => x.extract[Int]
 }
 val reduceId = (json \ "Reduce ID").extract[Int]
diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
index 98aaa9e..b77cd81 100644
--- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
@@ -312,7 +312,7 @@ class JsonProtocolSuite extends SparkFunSuite {
 val oldEvent = JsonProtocol.taskEndReasonToJson(fetchFailed)
   .removeField({ _._1 == "Map Index" })
 val expectedFetchFailed = FetchFailed(BlockManagerId("With or", "without 
you", 15), 17, 16L,
-  0, 19, "ignored")
+  Int.MinValue, 19, "ignored")
 assert(expectedFetchFailed === 
JsonProtocol.taskEndReasonFromJson(oldEvent))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version

2020-07-07 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9237fb2  [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value 
Int.MinValue to fill the map index when the event logs from the old Spark 
version
9237fb2 is described below

commit 9237fb2ca90d16208634d70cdff1a0ea9ddcce26
Author: Yuanjian Li 
AuthorDate: Wed Jul 8 09:36:06 2020 +0900

[SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill 
the map index when the event logs from the old Spark version

### What changes were proposed in this pull request?
Use the invalid value Int.MinValue to fill the map index when the event 
logs from the old Spark version.

### Why are the changes needed?
Follow up PR for #28941.

### Does this PR introduce _any_ user-facing change?
When we use the Spark version 3.0 history server reading the event log 
written by the old Spark version, we use the invalid value -2 to fill the map 
index.

### How was this patch tested?
Existing UT.

Closes #28965 from xuanyuanking/follow-up.

Authored-by: Yuanjian Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 365961155a655f19c9184b16ccd493838c848213)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 core/src/main/scala/org/apache/spark/TaskEndReason.scala  | 3 ++-
 core/src/main/scala/org/apache/spark/util/JsonProtocol.scala  | 8 ++--
 core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala | 2 +-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/TaskEndReason.scala 
b/core/src/main/scala/org/apache/spark/TaskEndReason.scala
index b13028f..6606d31 100644
--- a/core/src/main/scala/org/apache/spark/TaskEndReason.scala
+++ b/core/src/main/scala/org/apache/spark/TaskEndReason.scala
@@ -90,7 +90,8 @@ case class FetchFailed(
   extends TaskFailedReason {
   override def toErrorString: String = {
 val bmAddressString = if (bmAddress == null) "null" else bmAddress.toString
-s"FetchFailed($bmAddressString, shuffleId=$shuffleId, mapIndex=$mapIndex, 
" +
+val mapIndexString = if (mapIndex == Int.MinValue) "Unknown" else 
mapIndex.toString
+s"FetchFailed($bmAddressString, shuffleId=$shuffleId, 
mapIndex=$mapIndexString, " +
   s"mapId=$mapId, reduceId=$reduceId, message=\n$message\n)"
   }
 
diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index 74ff5c7..78fbd0c 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -993,8 +993,12 @@ private[spark] object JsonProtocol {
 val blockManagerAddress = blockManagerIdFromJson(json \ "Block Manager 
Address")
 val shuffleId = (json \ "Shuffle ID").extract[Int]
 val mapId = (json \ "Map ID").extract[Long]
-val mapIndex = (json \ "Map Index") match {
-  case JNothing => 0
+val mapIndex = json \ "Map Index" match {
+  case JNothing =>
+// Note, we use the invalid value Int.MinValue here to fill the 
map index for backward
+// compatibility. Otherwise, the fetch failed event will be 
dropped when the history
+// server loads the event log written by the Spark version before 
3.0.
+Int.MinValue
   case x => x.extract[Int]
 }
 val reduceId = (json \ "Reduce ID").extract[Int]
diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
index 98aaa9e..b77cd81 100644
--- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
@@ -312,7 +312,7 @@ class JsonProtocolSuite extends SparkFunSuite {
 val oldEvent = JsonProtocol.taskEndReasonToJson(fetchFailed)
   .removeField({ _._1 == "Map Index" })
 val expectedFetchFailed = FetchFailed(BlockManagerId("With or", "without 
you", 15), 17, 16L,
-  0, 19, "ignored")
+  Int.MinValue, 19, "ignored")
 assert(expectedFetchFailed === 
JsonProtocol.taskEndReasonFromJson(oldEvent))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (371b35d -> 8e7fc04)

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 371b35d  [SPARK-32214][SQL] The type conversion function generated in 
makeFromJava for "other"  type uses a wrong variable
 add 8e7fc04  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

No new revisions were added by this update.

Summary of changes:
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (371b35d -> 8e7fc04)

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 371b35d  [SPARK-32214][SQL] The type conversion function generated in 
makeFromJava for "other"  type uses a wrong variable
 add 8e7fc04  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

No new revisions were added by this update.

Summary of changes:
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
ac2c6cd3 is described below

commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index 0a1f333..4ddd6d9 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -75,14 +75,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index f78469e..b17880a 100644
--- 
a/core/src/test/scala/org/

[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new eddf40d  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
eddf40d is described below

commit eddf40d5e48ada258dff45661816073cb39eb721
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index ad0dd23..dc17140 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -76,14 +76,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index 4b1b921..a050519 100644
--- 
a/core/src/test/scala/org/

[spark] branch master updated (371b35d -> 8e7fc04)

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 371b35d  [SPARK-32214][SQL] The type conversion function generated in 
makeFromJava for "other"  type uses a wrong variable
 add 8e7fc04  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

No new revisions were added by this update.

Summary of changes:
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
ac2c6cd3 is described below

commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index 0a1f333..4ddd6d9 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -75,14 +75,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index f78469e..b17880a 100644
--- 
a/core/src/test/scala/org/

[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new eddf40d  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
eddf40d is described below

commit eddf40d5e48ada258dff45661816073cb39eb721
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index ad0dd23..dc17140 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -76,14 +76,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index 4b1b921..a050519 100644
--- 
a/core/src/test/scala/org/

[spark] branch master updated (371b35d -> 8e7fc04)

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 371b35d  [SPARK-32214][SQL] The type conversion function generated in 
makeFromJava for "other"  type uses a wrong variable
 add 8e7fc04  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

No new revisions were added by this update.

Summary of changes:
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
ac2c6cd3 is described below

commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index 0a1f333..4ddd6d9 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -75,14 +75,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index f78469e..b17880a 100644
--- 
a/core/src/test/scala/org/

[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new eddf40d  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
eddf40d is described below

commit eddf40d5e48ada258dff45661816073cb39eb721
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index ad0dd23..dc17140 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -76,14 +76,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index 4b1b921..a050519 100644
--- 
a/core/src/test/scala/org/

[spark] branch master updated (371b35d -> 8e7fc04)

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 371b35d  [SPARK-32214][SQL] The type conversion function generated in 
makeFromJava for "other"  type uses a wrong variable
 add 8e7fc04  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

No new revisions were added by this update.

Summary of changes:
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
ac2c6cd3 is described below

commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index 0a1f333..4ddd6d9 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -75,14 +75,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index f78469e..b17880a 100644
--- 
a/core/src/test/scala/org/

[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new eddf40d  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
eddf40d is described below

commit eddf40d5e48ada258dff45661816073cb39eb721
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index ad0dd23..dc17140 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -76,14 +76,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index 4b1b921..a050519 100644
--- 
a/core/src/test/scala/org/

[spark] branch branch-3.0 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ac2c6cd3 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
ac2c6cd3 is described below

commit ac2c6cd3cfae369f6d1af6abea567263d34a29b2
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index 0a1f333..4ddd6d9 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -75,14 +75,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index f78469e..b17880a 100644
--- 
a/core/src/test/scala/org/

[spark] branch branch-2.4 updated: [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing

2020-07-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new eddf40d  [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing
eddf40d is described below

commit eddf40d5e48ada258dff45661816073cb39eb721
Author: Zhen Li 
AuthorDate: Wed Jul 8 21:58:45 2020 +0900

[SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during 
HistoryServerDiskManager initializing

### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during 
HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug 
[32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after 
history server restart, below error would randomly happen: 
"java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, 
delta = -***)" from `HistoryServerDiskManager`.

![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, 
which may also trigger size of level db directory changes.  This size change 
may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When 
service restarts, `currentUsage` is calculated from real directory size, but 
`ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less 
then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, 
`ApplicationStoreInfo.size` is used to [...]
**Reproduce**: we can reproduce this issue in dev environment by reducing 
config value of "spark.history.retainedApplications" and 
"spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. 
start history server, load some applications and access some pages (maybe 
"stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from 
zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8e7fc04637bbb8d2fdc2c758746e0eaf496c4d92)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../deploy/history/HistoryServerDiskManager.scala  | 21 --
 .../history/HistoryServerDiskManagerSuite.scala| 46 ++
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
index ad0dd23..dc17140 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala
@@ -76,14 +76,29 @@ private class HistoryServerDiskManager(
 
 // Go through the recorded store directories and remove any that may have 
been removed by
 // external code.
-val orphans = listing.view(classOf[ApplicationStoreInfo]).asScala.filter { 
info =>
-  !new File(info.path).exists()
-}.toSeq
+val (existences, orphans) = listing
+  .view(classOf[ApplicationStoreInfo])
+  .asScala
+  .toSeq
+  .partition { info =>
+new File(info.path).exists()
+  }
 
 orphans.foreach { info =>
   listing.delete(info.getClass(), info.path)
 }
 
+// Reading level db would trigger table file compaction, then it may cause 
size of level db
+// directory changed. When service restarts, "currentUsage" is calculated 
from real directory
+// size. Update "ApplicationStoreInfo.size" to ensure "currentUsage" equals
+// sum of "ApplicationStoreInfo.size".
+existences.foreach { info =>
+  val fileSize = sizeOf(new File(info.path))
+  if (fileSize != info.size) {
+listing.write(info.copy(size = fileSize))
+  }
+}
+
 logInfo("Initialized disk manager: " +
   s"current usage = ${Utils.bytesToString(currentUsage.get())}, " +
   s"max usage = ${Utils.bytesToString(maxUsage)}")
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala
index 4b1b921..a050519 100644
--- 
a/core/src/test/scala/org/

[spark] branch master updated (4609f1f -> e6e43cb)

2020-07-09 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4609f1f  [SPARK-32207][SQL] Support 'F'-suffixed Float Literals
 add e6e43cb  [SPARK-32242][SQL] CliSuite flakiness fix via differentiating 
cli driver bootup timeout and query execution timeout

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/thriftserver/CliSuite.scala | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4609f1f -> e6e43cb)

2020-07-09 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4609f1f  [SPARK-32207][SQL] Support 'F'-suffixed Float Literals
 add e6e43cb  [SPARK-32242][SQL] CliSuite flakiness fix via differentiating 
cli driver bootup timeout and query execution timeout

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/thriftserver/CliSuite.scala | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4609f1f -> e6e43cb)

2020-07-09 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4609f1f  [SPARK-32207][SQL] Support 'F'-suffixed Float Literals
 add e6e43cb  [SPARK-32242][SQL] CliSuite flakiness fix via differentiating 
cli driver bootup timeout and query execution timeout

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/thriftserver/CliSuite.scala | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4609f1f -> e6e43cb)

2020-07-09 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4609f1f  [SPARK-32207][SQL] Support 'F'-suffixed Float Literals
 add e6e43cb  [SPARK-32242][SQL] CliSuite flakiness fix via differentiating 
cli driver bootup timeout and query execution timeout

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/thriftserver/CliSuite.scala | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0c9196e -> 1b3fc9a)

2020-07-11 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0c9196e  [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting 
Malformed class name in ScalaUDF
 add 1b3fc9a  [SPARK-32149][SHUFFLE] Improve file path name normalisation 
at block resolution within the external shuffle service

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 52 +-
 .../shuffle/ExternalShuffleBlockResolver.java  |  3 --
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 ---
 3 files changed, 10 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0c9196e -> 1b3fc9a)

2020-07-11 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0c9196e  [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting 
Malformed class name in ScalaUDF
 add 1b3fc9a  [SPARK-32149][SHUFFLE] Improve file path name normalisation 
at block resolution within the external shuffle service

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 52 +-
 .../shuffle/ExternalShuffleBlockResolver.java  |  3 --
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 ---
 3 files changed, 10 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0c9196e -> 1b3fc9a)

2020-07-11 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0c9196e  [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting 
Malformed class name in ScalaUDF
 add 1b3fc9a  [SPARK-32149][SHUFFLE] Improve file path name normalisation 
at block resolution within the external shuffle service

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 52 +-
 .../shuffle/ExternalShuffleBlockResolver.java  |  3 --
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 ---
 3 files changed, 10 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0c9196e -> 1b3fc9a)

2020-07-11 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0c9196e  [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting 
Malformed class name in ScalaUDF
 add 1b3fc9a  [SPARK-32149][SHUFFLE] Improve file path name normalisation 
at block resolution within the external shuffle service

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 52 +-
 .../shuffle/ExternalShuffleBlockResolver.java  |  3 --
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 ---
 3 files changed, 10 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0c9196e -> 1b3fc9a)

2020-07-11 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0c9196e  [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting 
Malformed class name in ScalaUDF
 add 1b3fc9a  [SPARK-32149][SHUFFLE] Improve file path name normalisation 
at block resolution within the external shuffle service

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 52 +-
 .../shuffle/ExternalShuffleBlockResolver.java  |  3 --
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 27 ---
 3 files changed, 10 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c4b0639 -> ad90cbf)

2020-07-12 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c4b0639  [SPARK-32270][SQL] Use TextFileFormat in CSV's schema 
inference with a different encoding
 add ad90cbf  [SPARK-31831][SQL][TESTS] Use subclasses for mock in 
HiveSessionImplSuite

No new revisions were added by this update.

Summary of changes:
 .../hive/thriftserver/HiveSessionImplSuite.scala   | 86 --
 1 file changed, 63 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c4b0639 -> ad90cbf)

2020-07-12 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c4b0639  [SPARK-32270][SQL] Use TextFileFormat in CSV's schema 
inference with a different encoding
 add ad90cbf  [SPARK-31831][SQL][TESTS] Use subclasses for mock in 
HiveSessionImplSuite

No new revisions were added by this update.

Summary of changes:
 .../hive/thriftserver/HiveSessionImplSuite.scala   | 86 --
 1 file changed, 63 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c4b0639 -> ad90cbf)

2020-07-12 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c4b0639  [SPARK-32270][SQL] Use TextFileFormat in CSV's schema 
inference with a different encoding
 add ad90cbf  [SPARK-31831][SQL][TESTS] Use subclasses for mock in 
HiveSessionImplSuite

No new revisions were added by this update.

Summary of changes:
 .../hive/thriftserver/HiveSessionImplSuite.scala   | 86 --
 1 file changed, 63 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c4b0639 -> ad90cbf)

2020-07-12 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c4b0639  [SPARK-32270][SQL] Use TextFileFormat in CSV's schema 
inference with a different encoding
 add ad90cbf  [SPARK-31831][SQL][TESTS] Use subclasses for mock in 
HiveSessionImplSuite

No new revisions were added by this update.

Summary of changes:
 .../hive/thriftserver/HiveSessionImplSuite.scala   | 86 --
 1 file changed, 63 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c602d79 -> 90b0c26)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c602d79  [SPARK-32311][PYSPARK][TESTS] Remove duplicate import
 add 90b0c26  [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make 
loading UI faster

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  99 +++
 .../history/HistoryServerMemoryManager.scala   |  85 ++
 .../apache/spark/deploy/history/HybridStore.scala  | 185 +
 .../org/apache/spark/internal/config/History.scala |  16 ++
 docs/monitoring.md |  19 +++
 5 files changed, 404 insertions(+)
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c602d79 -> 90b0c26)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c602d79  [SPARK-32311][PYSPARK][TESTS] Remove duplicate import
 add 90b0c26  [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make 
loading UI faster

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  99 +++
 .../history/HistoryServerMemoryManager.scala   |  85 ++
 .../apache/spark/deploy/history/HybridStore.scala  | 185 +
 .../org/apache/spark/internal/config/History.scala |  16 ++
 docs/monitoring.md |  19 +++
 5 files changed, 404 insertions(+)
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c602d79 -> 90b0c26)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c602d79  [SPARK-32311][PYSPARK][TESTS] Remove duplicate import
 add 90b0c26  [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make 
loading UI faster

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  99 +++
 .../history/HistoryServerMemoryManager.scala   |  85 ++
 .../apache/spark/deploy/history/HybridStore.scala  | 185 +
 .../org/apache/spark/internal/config/History.scala |  16 ++
 docs/monitoring.md |  19 +++
 5 files changed, 404 insertions(+)
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c602d79 -> 90b0c26)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c602d79  [SPARK-32311][PYSPARK][TESTS] Remove duplicate import
 add 90b0c26  [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make 
loading UI faster

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  99 +++
 .../history/HistoryServerMemoryManager.scala   |  85 ++
 .../apache/spark/deploy/history/HybridStore.scala  | 185 +
 .../org/apache/spark/internal/config/History.scala |  16 ++
 docs/monitoring.md |  19 +++
 5 files changed, 404 insertions(+)
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c602d79 -> 90b0c26)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c602d79  [SPARK-32311][PYSPARK][TESTS] Remove duplicate import
 add 90b0c26  [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make 
loading UI faster

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  99 +++
 .../history/HistoryServerMemoryManager.scala   |  85 ++
 .../apache/spark/deploy/history/HybridStore.scala  | 185 +
 .../org/apache/spark/internal/config/History.scala |  16 ++
 docs/monitoring.md |  19 +++
 5 files changed, 404 insertions(+)
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (af8e65f -> 542aefb)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from af8e65f  [SPARK-32276][SQL] Remove redundant sorts before repartition 
nodes
 add 542aefb  [SPARK-31985][SS] Remove incomplete/undocumented stateful 
aggregation in continuous mode

No new revisions were added by this update.

Summary of changes:
 .../analysis/UnsupportedOperationChecker.scala |  11 -
 .../datasources/v2/DataSourceV2Strategy.scala  |  14 +-
 .../continuous/ContinuousCoalesceExec.scala|  45 ---
 .../continuous/ContinuousCoalesceRDD.scala | 137 ---
 .../streaming/continuous/ContinuousExecution.scala |   4 -
 .../shuffle/ContinuousShuffleReadRDD.scala |  80 
 .../shuffle/ContinuousShuffleReader.scala  |  32 --
 .../shuffle/ContinuousShuffleWriter.scala  |  27 --
 .../shuffle/RPCContinuousShuffleReader.scala   | 138 ---
 .../shuffle/RPCContinuousShuffleWriter.scala   |  60 ---
 .../execution/streaming/state/StateStoreRDD.scala  |  14 +-
 .../shuffle/ContinuousShuffleSuite.scala   | 423 -
 .../continuous/ContinuousAggregationSuite.scala| 134 ---
 13 files changed, 2 insertions(+), 1117 deletions(-)
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (af8e65f -> 542aefb)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from af8e65f  [SPARK-32276][SQL] Remove redundant sorts before repartition 
nodes
 add 542aefb  [SPARK-31985][SS] Remove incomplete/undocumented stateful 
aggregation in continuous mode

No new revisions were added by this update.

Summary of changes:
 .../analysis/UnsupportedOperationChecker.scala |  11 -
 .../datasources/v2/DataSourceV2Strategy.scala  |  14 +-
 .../continuous/ContinuousCoalesceExec.scala|  45 ---
 .../continuous/ContinuousCoalesceRDD.scala | 137 ---
 .../streaming/continuous/ContinuousExecution.scala |   4 -
 .../shuffle/ContinuousShuffleReadRDD.scala |  80 
 .../shuffle/ContinuousShuffleReader.scala  |  32 --
 .../shuffle/ContinuousShuffleWriter.scala  |  27 --
 .../shuffle/RPCContinuousShuffleReader.scala   | 138 ---
 .../shuffle/RPCContinuousShuffleWriter.scala   |  60 ---
 .../execution/streaming/state/StateStoreRDD.scala  |  14 +-
 .../shuffle/ContinuousShuffleSuite.scala   | 423 -
 .../continuous/ContinuousAggregationSuite.scala| 134 ---
 13 files changed, 2 insertions(+), 1117 deletions(-)
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (af8e65f -> 542aefb)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from af8e65f  [SPARK-32276][SQL] Remove redundant sorts before repartition 
nodes
 add 542aefb  [SPARK-31985][SS] Remove incomplete/undocumented stateful 
aggregation in continuous mode

No new revisions were added by this update.

Summary of changes:
 .../analysis/UnsupportedOperationChecker.scala |  11 -
 .../datasources/v2/DataSourceV2Strategy.scala  |  14 +-
 .../continuous/ContinuousCoalesceExec.scala|  45 ---
 .../continuous/ContinuousCoalesceRDD.scala | 137 ---
 .../streaming/continuous/ContinuousExecution.scala |   4 -
 .../shuffle/ContinuousShuffleReadRDD.scala |  80 
 .../shuffle/ContinuousShuffleReader.scala  |  32 --
 .../shuffle/ContinuousShuffleWriter.scala  |  27 --
 .../shuffle/RPCContinuousShuffleReader.scala   | 138 ---
 .../shuffle/RPCContinuousShuffleWriter.scala   |  60 ---
 .../execution/streaming/state/StateStoreRDD.scala  |  14 +-
 .../shuffle/ContinuousShuffleSuite.scala   | 423 -
 .../continuous/ContinuousAggregationSuite.scala| 134 ---
 13 files changed, 2 insertions(+), 1117 deletions(-)
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (af8e65f -> 542aefb)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from af8e65f  [SPARK-32276][SQL] Remove redundant sorts before repartition 
nodes
 add 542aefb  [SPARK-31985][SS] Remove incomplete/undocumented stateful 
aggregation in continuous mode

No new revisions were added by this update.

Summary of changes:
 .../analysis/UnsupportedOperationChecker.scala |  11 -
 .../datasources/v2/DataSourceV2Strategy.scala  |  14 +-
 .../continuous/ContinuousCoalesceExec.scala|  45 ---
 .../continuous/ContinuousCoalesceRDD.scala | 137 ---
 .../streaming/continuous/ContinuousExecution.scala |   4 -
 .../shuffle/ContinuousShuffleReadRDD.scala |  80 
 .../shuffle/ContinuousShuffleReader.scala  |  32 --
 .../shuffle/ContinuousShuffleWriter.scala  |  27 --
 .../shuffle/RPCContinuousShuffleReader.scala   | 138 ---
 .../shuffle/RPCContinuousShuffleWriter.scala   |  60 ---
 .../execution/streaming/state/StateStoreRDD.scala  |  14 +-
 .../shuffle/ContinuousShuffleSuite.scala   | 423 -
 .../continuous/ContinuousAggregationSuite.scala| 134 ---
 13 files changed, 2 insertions(+), 1117 deletions(-)
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (af8e65f -> 542aefb)

2020-07-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from af8e65f  [SPARK-32276][SQL] Remove redundant sorts before repartition 
nodes
 add 542aefb  [SPARK-31985][SS] Remove incomplete/undocumented stateful 
aggregation in continuous mode

No new revisions were added by this update.

Summary of changes:
 .../analysis/UnsupportedOperationChecker.scala |  11 -
 .../datasources/v2/DataSourceV2Strategy.scala  |  14 +-
 .../continuous/ContinuousCoalesceExec.scala|  45 ---
 .../continuous/ContinuousCoalesceRDD.scala | 137 ---
 .../streaming/continuous/ContinuousExecution.scala |   4 -
 .../shuffle/ContinuousShuffleReadRDD.scala |  80 
 .../shuffle/ContinuousShuffleReader.scala  |  32 --
 .../shuffle/ContinuousShuffleWriter.scala  |  27 --
 .../shuffle/RPCContinuousShuffleReader.scala   | 138 ---
 .../shuffle/RPCContinuousShuffleWriter.scala   |  60 ---
 .../execution/streaming/state/StateStoreRDD.scala  |  14 +-
 .../shuffle/ContinuousShuffleSuite.scala   | 423 -
 .../continuous/ContinuousAggregationSuite.scala| 134 ---
 13 files changed, 2 insertions(+), 1117 deletions(-)
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
 delete mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fb51925 -> 9747e8f)

2020-07-16 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fb51925  [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT
 add 9747e8f  [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for 
HiveSessionImplSuite in hive version related subdirectories

No new revisions were added by this update.

Summary of changes:
 sql/hive-thriftserver/pom.xml  | 12 ++
 .../hive/thriftserver/HiveSessionImplSuite.scala   | 29 +
 .../thriftserver/GetCatalogsOperationMock.scala| 50 ++
 .../thriftserver/GetCatalogsOperationMock.scala| 50 ++
 4 files changed, 113 insertions(+), 28 deletions(-)
 create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ 
org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala
 create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ 
org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fb51925 -> 9747e8f)

2020-07-16 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fb51925  [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT
 add 9747e8f  [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for 
HiveSessionImplSuite in hive version related subdirectories

No new revisions were added by this update.

Summary of changes:
 sql/hive-thriftserver/pom.xml  | 12 ++
 .../hive/thriftserver/HiveSessionImplSuite.scala   | 29 +
 .../thriftserver/GetCatalogsOperationMock.scala| 50 ++
 .../thriftserver/GetCatalogsOperationMock.scala| 50 ++
 4 files changed, 113 insertions(+), 28 deletions(-)
 create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ 
org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala
 create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ 
org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fb51925 -> 9747e8f)

2020-07-16 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fb51925  [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT
 add 9747e8f  [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for 
HiveSessionImplSuite in hive version related subdirectories

No new revisions were added by this update.

Summary of changes:
 sql/hive-thriftserver/pom.xml  | 12 ++
 .../hive/thriftserver/HiveSessionImplSuite.scala   | 29 +
 .../thriftserver/GetCatalogsOperationMock.scala| 50 ++
 .../thriftserver/GetCatalogsOperationMock.scala| 50 ++
 4 files changed, 113 insertions(+), 28 deletions(-)
 create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ 
org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala
 create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ 
org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fb51925 -> 9747e8f)

2020-07-16 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fb51925  [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT
 add 9747e8f  [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for 
HiveSessionImplSuite in hive version related subdirectories

No new revisions were added by this update.

Summary of changes:
 sql/hive-thriftserver/pom.xml  | 12 ++
 .../hive/thriftserver/HiveSessionImplSuite.scala   | 29 +
 .../thriftserver/GetCatalogsOperationMock.scala| 50 ++
 .../thriftserver/GetCatalogsOperationMock.scala| 50 ++
 4 files changed, 113 insertions(+), 28 deletions(-)
 create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ 
org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala
 create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ 
org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (39181ff -> 7b9d755)

2020-07-21 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39181ff  [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash 
join if applicable
 add 7b9d755  [SPARK-32350][CORE] Add batch-write on LevelDB to improve 
performance of HybridStore

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++
 .../apache/spark/deploy/history/HybridStore.scala  |  9 +--
 2 files changed, 66 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (39181ff -> 7b9d755)

2020-07-21 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39181ff  [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash 
join if applicable
 add 7b9d755  [SPARK-32350][CORE] Add batch-write on LevelDB to improve 
performance of HybridStore

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++
 .../apache/spark/deploy/history/HybridStore.scala  |  9 +--
 2 files changed, 66 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (39181ff -> 7b9d755)

2020-07-21 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39181ff  [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash 
join if applicable
 add 7b9d755  [SPARK-32350][CORE] Add batch-write on LevelDB to improve 
performance of HybridStore

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++
 .../apache/spark/deploy/history/HybridStore.scala  |  9 +--
 2 files changed, 66 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (39181ff -> 7b9d755)

2020-07-21 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39181ff  [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash 
join if applicable
 add 7b9d755  [SPARK-32350][CORE] Add batch-write on LevelDB to improve 
performance of HybridStore

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++
 .../apache/spark/deploy/history/HybridStore.scala  |  9 +--
 2 files changed, 66 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (39181ff -> 7b9d755)

2020-07-21 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39181ff  [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash 
join if applicable
 add 7b9d755  [SPARK-32350][CORE] Add batch-write on LevelDB to improve 
performance of HybridStore

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++
 .../apache/spark/deploy/history/HybridStore.scala  |  9 +--
 2 files changed, 66 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d7b1d9 -> f602782)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d7b1d9  [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added 
in
 add f602782  [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API 
calls to avoid infinite wait in tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++---
 1 file changed, 14 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d7b1d9 -> f602782)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d7b1d9  [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added 
in
 add f602782  [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API 
calls to avoid infinite wait in tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++---
 1 file changed, 14 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d7b1d9 -> f602782)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d7b1d9  [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added 
in
 add f602782  [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API 
calls to avoid infinite wait in tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++---
 1 file changed, 14 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d7b1d9 -> f602782)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d7b1d9  [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added 
in
 add f602782  [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API 
calls to avoid infinite wait in tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++---
 1 file changed, 14 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d7b1d9 -> f602782)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d7b1d9  [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added 
in
 add f602782  [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API 
calls to avoid infinite wait in tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++---
 1 file changed, 14 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ae82768 -> 813532d)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ae82768  [SPARK-32421][SQL] Add code-gen for shuffled hash join
 add 813532d  [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka 
connector tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +-
 .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala  | 2 +-
 .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 
 3 files changed, 6 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ae82768 -> 813532d)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ae82768  [SPARK-32421][SQL] Add code-gen for shuffled hash join
 add 813532d  [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka 
connector tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +-
 .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala  | 2 +-
 .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 
 3 files changed, 6 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ae82768 -> 813532d)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ae82768  [SPARK-32421][SQL] Add code-gen for shuffled hash join
 add 813532d  [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka 
connector tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +-
 .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala  | 2 +-
 .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 
 3 files changed, 6 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ae82768 -> 813532d)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ae82768  [SPARK-32421][SQL] Add code-gen for shuffled hash join
 add 813532d  [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka 
connector tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +-
 .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala  | 2 +-
 .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 
 3 files changed, 6 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ae82768 -> 813532d)

2020-07-30 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ae82768  [SPARK-32421][SQL] Add code-gen for shuffled hash join
 add 813532d  [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka 
connector tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +-
 .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala  | 2 +-
 .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 
 3 files changed, 6 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (12f4331 -> 8b26c69)

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 12f4331  [SPARK-32672][SQL] Fix data corruption in boolean bit set 
compression
 add 8b26c69  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations

No new revisions were added by this update.

Summary of changes:
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (12f4331 -> 8b26c69)

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 12f4331  [SPARK-32672][SQL] Fix data corruption in boolean bit set 
compression
 add 8b26c69  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations

No new revisions were added by this update.

Summary of changes:
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a6df16b  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations
a6df16b is described below

commit a6df16b36210da32359c77205920eaee98d3e232
Author: Yuanjian Li 
AuthorDate: Sat Aug 22 21:32:23 2020 +0900

[SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some 
operations

### What changes were proposed in this pull request?
Rephrase the description for some operations to make it clearer.

### Why are the changes needed?
Add more detail in the document.

### Does this PR introduce _any_ user-facing change?
No, document only.

### How was this patch tested?
Document only.

Closes #29269 from xuanyuanking/SPARK-31792-follow.

Authored-by: Yuanjian Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/docs/web-ui.md b/docs/web-ui.md
index 134a8c8..fe26043 100644
--- a/docs/web-ui.md
+++ b/docs/web-ui.md
@@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics.
 * **Batch Duration.** The process duration of each batch. 
 * **Operation Duration.** The amount of time taken to perform various 
operations in milliseconds.
 The tracked operations are listed as follows.
-* addBatch: Adds result data of the current batch to the sink.
-* getBatch: Gets a new batch of data to process.
-* latestOffset: Gets the latest offsets for sources. 
-* queryPlanning: Generates the execution plan.
-* walCommit: Writes the offsets to the metadata log.
+* addBatch: Time taken to read the micro-batch's input data from the 
sources, process it, and write the batch's output to the sink. This should take 
the bulk of the micro-batch's time.
+* getBatch: Time taken to prepare the logical query to read the input of 
the current micro-batch from the sources.
+* latestOffset & getOffset: Time taken to query the maximum available 
offset for this source.
+* queryPlanning: Time taken to generates the execution plan.
+* walCommit: Time taken to write the offsets to the metadata log.
 
 As an early-release version, the statistics page is still under development 
and will be improved in
 future releases.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
index e022bfb..e0731db 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
@@ -566,8 +566,7 @@ class MicroBatchExecution(
 val nextBatch =
   new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema))
 
-val batchSinkProgress: Option[StreamWriterCommitProgress] =
-  reportTimeTaken("addBatch") {
+val batchSinkProgress: Option[StreamWriterCommitProgress] = 
reportTimeTaken("addBatch") {
   SQLExecution.withNewExecutionId(lastExecution) {
 sink match {
   case s: Sink => s.addBatch(currentBatchId, nextBatch)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (12f4331 -> 8b26c69)

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 12f4331  [SPARK-32672][SQL] Fix data corruption in boolean bit set 
compression
 add 8b26c69  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations

No new revisions were added by this update.

Summary of changes:
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a6df16b  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations
a6df16b is described below

commit a6df16b36210da32359c77205920eaee98d3e232
Author: Yuanjian Li 
AuthorDate: Sat Aug 22 21:32:23 2020 +0900

[SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some 
operations

### What changes were proposed in this pull request?
Rephrase the description for some operations to make it clearer.

### Why are the changes needed?
Add more detail in the document.

### Does this PR introduce _any_ user-facing change?
No, document only.

### How was this patch tested?
Document only.

Closes #29269 from xuanyuanking/SPARK-31792-follow.

Authored-by: Yuanjian Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/docs/web-ui.md b/docs/web-ui.md
index 134a8c8..fe26043 100644
--- a/docs/web-ui.md
+++ b/docs/web-ui.md
@@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics.
 * **Batch Duration.** The process duration of each batch. 
 * **Operation Duration.** The amount of time taken to perform various 
operations in milliseconds.
 The tracked operations are listed as follows.
-* addBatch: Adds result data of the current batch to the sink.
-* getBatch: Gets a new batch of data to process.
-* latestOffset: Gets the latest offsets for sources. 
-* queryPlanning: Generates the execution plan.
-* walCommit: Writes the offsets to the metadata log.
+* addBatch: Time taken to read the micro-batch's input data from the 
sources, process it, and write the batch's output to the sink. This should take 
the bulk of the micro-batch's time.
+* getBatch: Time taken to prepare the logical query to read the input of 
the current micro-batch from the sources.
+* latestOffset & getOffset: Time taken to query the maximum available 
offset for this source.
+* queryPlanning: Time taken to generates the execution plan.
+* walCommit: Time taken to write the offsets to the metadata log.
 
 As an early-release version, the statistics page is still under development 
and will be improved in
 future releases.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
index e022bfb..e0731db 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
@@ -566,8 +566,7 @@ class MicroBatchExecution(
 val nextBatch =
   new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema))
 
-val batchSinkProgress: Option[StreamWriterCommitProgress] =
-  reportTimeTaken("addBatch") {
+val batchSinkProgress: Option[StreamWriterCommitProgress] = 
reportTimeTaken("addBatch") {
   SQLExecution.withNewExecutionId(lastExecution) {
 sink match {
   case s: Sink => s.addBatch(currentBatchId, nextBatch)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (12f4331 -> 8b26c69)

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 12f4331  [SPARK-32672][SQL] Fix data corruption in boolean bit set 
compression
 add 8b26c69  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations

No new revisions were added by this update.

Summary of changes:
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a6df16b  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations
a6df16b is described below

commit a6df16b36210da32359c77205920eaee98d3e232
Author: Yuanjian Li 
AuthorDate: Sat Aug 22 21:32:23 2020 +0900

[SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some 
operations

### What changes were proposed in this pull request?
Rephrase the description for some operations to make it clearer.

### Why are the changes needed?
Add more detail in the document.

### Does this PR introduce _any_ user-facing change?
No, document only.

### How was this patch tested?
Document only.

Closes #29269 from xuanyuanking/SPARK-31792-follow.

Authored-by: Yuanjian Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/docs/web-ui.md b/docs/web-ui.md
index 134a8c8..fe26043 100644
--- a/docs/web-ui.md
+++ b/docs/web-ui.md
@@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics.
 * **Batch Duration.** The process duration of each batch. 
 * **Operation Duration.** The amount of time taken to perform various 
operations in milliseconds.
 The tracked operations are listed as follows.
-* addBatch: Adds result data of the current batch to the sink.
-* getBatch: Gets a new batch of data to process.
-* latestOffset: Gets the latest offsets for sources. 
-* queryPlanning: Generates the execution plan.
-* walCommit: Writes the offsets to the metadata log.
+* addBatch: Time taken to read the micro-batch's input data from the 
sources, process it, and write the batch's output to the sink. This should take 
the bulk of the micro-batch's time.
+* getBatch: Time taken to prepare the logical query to read the input of 
the current micro-batch from the sources.
+* latestOffset & getOffset: Time taken to query the maximum available 
offset for this source.
+* queryPlanning: Time taken to generates the execution plan.
+* walCommit: Time taken to write the offsets to the metadata log.
 
 As an early-release version, the statistics page is still under development 
and will be improved in
 future releases.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
index e022bfb..e0731db 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
@@ -566,8 +566,7 @@ class MicroBatchExecution(
 val nextBatch =
   new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema))
 
-val batchSinkProgress: Option[StreamWriterCommitProgress] =
-  reportTimeTaken("addBatch") {
+val batchSinkProgress: Option[StreamWriterCommitProgress] = 
reportTimeTaken("addBatch") {
   SQLExecution.withNewExecutionId(lastExecution) {
 sink match {
   case s: Sink => s.addBatch(currentBatchId, nextBatch)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (12f4331 -> 8b26c69)

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 12f4331  [SPARK-32672][SQL] Fix data corruption in boolean bit set 
compression
 add 8b26c69  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations

No new revisions were added by this update.

Summary of changes:
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a6df16b  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations
a6df16b is described below

commit a6df16b36210da32359c77205920eaee98d3e232
Author: Yuanjian Li 
AuthorDate: Sat Aug 22 21:32:23 2020 +0900

[SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some 
operations

### What changes were proposed in this pull request?
Rephrase the description for some operations to make it clearer.

### Why are the changes needed?
Add more detail in the document.

### Does this PR introduce _any_ user-facing change?
No, document only.

### How was this patch tested?
Document only.

Closes #29269 from xuanyuanking/SPARK-31792-follow.

Authored-by: Yuanjian Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/docs/web-ui.md b/docs/web-ui.md
index 134a8c8..fe26043 100644
--- a/docs/web-ui.md
+++ b/docs/web-ui.md
@@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics.
 * **Batch Duration.** The process duration of each batch. 
 * **Operation Duration.** The amount of time taken to perform various 
operations in milliseconds.
 The tracked operations are listed as follows.
-* addBatch: Adds result data of the current batch to the sink.
-* getBatch: Gets a new batch of data to process.
-* latestOffset: Gets the latest offsets for sources. 
-* queryPlanning: Generates the execution plan.
-* walCommit: Writes the offsets to the metadata log.
+* addBatch: Time taken to read the micro-batch's input data from the 
sources, process it, and write the batch's output to the sink. This should take 
the bulk of the micro-batch's time.
+* getBatch: Time taken to prepare the logical query to read the input of 
the current micro-batch from the sources.
+* latestOffset & getOffset: Time taken to query the maximum available 
offset for this source.
+* queryPlanning: Time taken to generates the execution plan.
+* walCommit: Time taken to write the offsets to the metadata log.
 
 As an early-release version, the statistics page is still under development 
and will be improved in
 future releases.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
index e022bfb..e0731db 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
@@ -566,8 +566,7 @@ class MicroBatchExecution(
 val nextBatch =
   new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema))
 
-val batchSinkProgress: Option[StreamWriterCommitProgress] =
-  reportTimeTaken("addBatch") {
+val batchSinkProgress: Option[StreamWriterCommitProgress] = 
reportTimeTaken("addBatch") {
   SQLExecution.withNewExecutionId(lastExecution) {
 sink match {
   case s: Sink => s.addBatch(currentBatchId, nextBatch)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations

2020-08-22 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a6df16b  [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description 
for some operations
a6df16b is described below

commit a6df16b36210da32359c77205920eaee98d3e232
Author: Yuanjian Li 
AuthorDate: Sat Aug 22 21:32:23 2020 +0900

[SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some 
operations

### What changes were proposed in this pull request?
Rephrase the description for some operations to make it clearer.

### Why are the changes needed?
Add more detail in the document.

### Does this PR introduce _any_ user-facing change?
No, document only.

### How was this patch tested?
Document only.

Closes #29269 from xuanyuanking/SPARK-31792-follow.

Authored-by: Yuanjian Li 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 docs/web-ui.md | 10 +-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/docs/web-ui.md b/docs/web-ui.md
index 134a8c8..fe26043 100644
--- a/docs/web-ui.md
+++ b/docs/web-ui.md
@@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics.
 * **Batch Duration.** The process duration of each batch. 
 * **Operation Duration.** The amount of time taken to perform various 
operations in milliseconds.
 The tracked operations are listed as follows.
-* addBatch: Adds result data of the current batch to the sink.
-* getBatch: Gets a new batch of data to process.
-* latestOffset: Gets the latest offsets for sources. 
-* queryPlanning: Generates the execution plan.
-* walCommit: Writes the offsets to the metadata log.
+* addBatch: Time taken to read the micro-batch's input data from the 
sources, process it, and write the batch's output to the sink. This should take 
the bulk of the micro-batch's time.
+* getBatch: Time taken to prepare the logical query to read the input of 
the current micro-batch from the sources.
+* latestOffset & getOffset: Time taken to query the maximum available 
offset for this source.
+* queryPlanning: Time taken to generates the execution plan.
+* walCommit: Time taken to write the offsets to the metadata log.
 
 As an early-release version, the statistics page is still under development 
and will be improved in
 future releases.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
index e022bfb..e0731db 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
@@ -566,8 +566,7 @@ class MicroBatchExecution(
 val nextBatch =
   new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema))
 
-val batchSinkProgress: Option[StreamWriterCommitProgress] =
-  reportTimeTaken("addBatch") {
+val batchSinkProgress: Option[StreamWriterCommitProgress] = 
reportTimeTaken("addBatch") {
   SQLExecution.withNewExecutionId(lastExecution) {
 sink match {
   case s: Sink => s.addBatch(currentBatchId, nextBatch)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4a09613 -> db89b0e)

2020-09-09 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a09613  Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent 
support mode for spark-sql CLI"
 add db89b0e  [SPARK-32831][SS] Refactor SupportsStreamingUpdate to 
represent actual meaning of the behavior

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaSourceProvider.scala   |  5 ++---
 ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++
 .../sql/execution/datasources/noop/NoopDataSource.scala   |  5 ++---
 .../spark/sql/execution/streaming/StreamExecution.scala   |  6 +++---
 .../apache/spark/sql/execution/streaming/console.scala|  7 +++
 .../execution/streaming/sources/ForeachWriterTable.scala  |  7 +++
 .../spark/sql/execution/streaming/sources/memory.scala|  7 ++-
 7 files changed, 26 insertions(+), 26 deletions(-)
 rename 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsStreamingUpdateAsAppend.scala} (64%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4a09613 -> db89b0e)

2020-09-09 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a09613  Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent 
support mode for spark-sql CLI"
 add db89b0e  [SPARK-32831][SS] Refactor SupportsStreamingUpdate to 
represent actual meaning of the behavior

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaSourceProvider.scala   |  5 ++---
 ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++
 .../sql/execution/datasources/noop/NoopDataSource.scala   |  5 ++---
 .../spark/sql/execution/streaming/StreamExecution.scala   |  6 +++---
 .../apache/spark/sql/execution/streaming/console.scala|  7 +++
 .../execution/streaming/sources/ForeachWriterTable.scala  |  7 +++
 .../spark/sql/execution/streaming/sources/memory.scala|  7 ++-
 7 files changed, 26 insertions(+), 26 deletions(-)
 rename 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsStreamingUpdateAsAppend.scala} (64%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4a09613 -> db89b0e)

2020-09-09 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a09613  Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent 
support mode for spark-sql CLI"
 add db89b0e  [SPARK-32831][SS] Refactor SupportsStreamingUpdate to 
represent actual meaning of the behavior

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaSourceProvider.scala   |  5 ++---
 ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++
 .../sql/execution/datasources/noop/NoopDataSource.scala   |  5 ++---
 .../spark/sql/execution/streaming/StreamExecution.scala   |  6 +++---
 .../apache/spark/sql/execution/streaming/console.scala|  7 +++
 .../execution/streaming/sources/ForeachWriterTable.scala  |  7 +++
 .../spark/sql/execution/streaming/sources/memory.scala|  7 ++-
 7 files changed, 26 insertions(+), 26 deletions(-)
 rename 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsStreamingUpdateAsAppend.scala} (64%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4a09613 -> db89b0e)

2020-09-09 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a09613  Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent 
support mode for spark-sql CLI"
 add db89b0e  [SPARK-32831][SS] Refactor SupportsStreamingUpdate to 
represent actual meaning of the behavior

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaSourceProvider.scala   |  5 ++---
 ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++
 .../sql/execution/datasources/noop/NoopDataSource.scala   |  5 ++---
 .../spark/sql/execution/streaming/StreamExecution.scala   |  6 +++---
 .../apache/spark/sql/execution/streaming/console.scala|  7 +++
 .../execution/streaming/sources/ForeachWriterTable.scala  |  7 +++
 .../spark/sql/execution/streaming/sources/memory.scala|  7 ++-
 7 files changed, 26 insertions(+), 26 deletions(-)
 rename 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsStreamingUpdateAsAppend.scala} (64%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4a09613 -> db89b0e)

2020-09-09 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a09613  Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent 
support mode for spark-sql CLI"
 add db89b0e  [SPARK-32831][SS] Refactor SupportsStreamingUpdate to 
represent actual meaning of the behavior

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaSourceProvider.scala   |  5 ++---
 ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++
 .../sql/execution/datasources/noop/NoopDataSource.scala   |  5 ++---
 .../spark/sql/execution/streaming/StreamExecution.scala   |  6 +++---
 .../apache/spark/sql/execution/streaming/console.scala|  7 +++
 .../execution/streaming/sources/ForeachWriterTable.scala  |  7 +++
 .../spark/sql/execution/streaming/sources/memory.scala|  7 ++-
 7 files changed, 26 insertions(+), 26 deletions(-)
 rename 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsStreamingUpdateAsAppend.scala} (64%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fdb571 -> d936cb3)

2020-09-16 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fdb571  [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 
2.13
 add d936cb3  [SPARK-26425][SS] Add more constraint checks to avoid 
checkpoint corruption

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +---
 .../spark/sql/execution/streaming/MicroBatchExecution.scala  |  4 +++-
 2 files changed, 12 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fdb571 -> d936cb3)

2020-09-16 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fdb571  [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 
2.13
 add d936cb3  [SPARK-26425][SS] Add more constraint checks to avoid 
checkpoint corruption

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +---
 .../spark/sql/execution/streaming/MicroBatchExecution.scala  |  4 +++-
 2 files changed, 12 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fdb571 -> d936cb3)

2020-09-16 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fdb571  [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 
2.13
 add d936cb3  [SPARK-26425][SS] Add more constraint checks to avoid 
checkpoint corruption

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +---
 .../spark/sql/execution/streaming/MicroBatchExecution.scala  |  4 +++-
 2 files changed, 12 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fdb571 -> d936cb3)

2020-09-16 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fdb571  [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 
2.13
 add d936cb3  [SPARK-26425][SS] Add more constraint checks to avoid 
checkpoint corruption

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +---
 .../spark/sql/execution/streaming/MicroBatchExecution.scala  |  4 +++-
 2 files changed, 12 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fdb571 -> d936cb3)

2020-09-16 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fdb571  [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 
2.13
 add d936cb3  [SPARK-26425][SS] Add more constraint checks to avoid 
checkpoint corruption

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +---
 .../spark/sql/execution/streaming/MicroBatchExecution.scala  |  4 +++-
 2 files changed, 12 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f3ad32f -> 9ab0ec4)

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3ad32f  [SPARK-33026][SQL][FOLLOWUP] metrics name should be 
numOutputRows
 add 9ab0ec4  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f3ad32f -> 9ab0ec4)

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3ad32f  [SPARK-33026][SQL][FOLLOWUP] metrics name should be 
numOutputRows
 add 9ab0ec4  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d9669bd  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS
d9669bd is described below

commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5
Author: Adam Binford 
AuthorDate: Thu Oct 15 11:59:29 2020 +0900

[SPARK-33146][CORE] Check for non-fatal errors when loading new 
applications in SHS

### What changes were proposed in this pull request?

Adds an additional check for non-fatal errors when attempting to add a new 
entry to the history server application listing.

### Why are the changes needed?

A bad rolling event log folder (missing appstatus file or no log files) 
would cause no applications to be loaded by the Spark history server. Figuring 
out why invalid event log folders are created in the first place will be 
addressed in separate issues, this just lets the history server skip the 
invalid folder and successfully load all the valid applications.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UT

Closes #30037 from Kimahriman/bug/rolling-log-crashing-history.

Authored-by: Adam Binford 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
index c262152..5970708 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
@@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
 reader.fileSizeForLastIndex > 0
   } catch {
 case _: FileNotFoundException => false
+case NonFatal(e) =>
+  logWarning(s"Error while reading new log 
${reader.rootPath}", e)
+  false
   }
 
 case _: FileNotFoundException =>
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
index c2f34fc..f3beb35 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
@@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with 
Matchers with Logging {
 }
   }
 
+  test("SPARK-33146: don't let one bad rolling log folder prevent loading 
other applications") {
+withTempDir { dir =>
+  val conf = createTestConf(true)
+  conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath)
+  val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
+  val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
+
+  val provider = new FsHistoryProvider(conf)
+
+  val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, 
conf, hadoopConf)
+  writer.start()
+
+  writeEventsToRollingWriter(writer, Seq(
+SparkListenerApplicationStart("app", Some("app"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(dir.listFiles().size === 1)
+  assert(provider.getListing.length === 1)
+
+  // Manually delete the appstatus file to make an invalid rolling event 
log
+  val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new 
Path(writer.logPath),
+"app", None, true)
+  fs.delete(appStatusPath, false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 0)
+
+  // Create a new application
+  val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, 
conf, hadoopConf)
+  writer2.start()
+  writeEventsToRollingWriter(writer2, Seq(
+SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+
+  // Both folders exist but only one application found
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 1)
+  assert(dir.listFiles

[spark] branch master updated (f3ad32f -> 9ab0ec4)

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3ad32f  [SPARK-33026][SQL][FOLLOWUP] metrics name should be 
numOutputRows
 add 9ab0ec4  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d9669bd  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS
d9669bd is described below

commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5
Author: Adam Binford 
AuthorDate: Thu Oct 15 11:59:29 2020 +0900

[SPARK-33146][CORE] Check for non-fatal errors when loading new 
applications in SHS

### What changes were proposed in this pull request?

Adds an additional check for non-fatal errors when attempting to add a new 
entry to the history server application listing.

### Why are the changes needed?

A bad rolling event log folder (missing appstatus file or no log files) 
would cause no applications to be loaded by the Spark history server. Figuring 
out why invalid event log folders are created in the first place will be 
addressed in separate issues, this just lets the history server skip the 
invalid folder and successfully load all the valid applications.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UT

Closes #30037 from Kimahriman/bug/rolling-log-crashing-history.

Authored-by: Adam Binford 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
index c262152..5970708 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
@@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
 reader.fileSizeForLastIndex > 0
   } catch {
 case _: FileNotFoundException => false
+case NonFatal(e) =>
+  logWarning(s"Error while reading new log 
${reader.rootPath}", e)
+  false
   }
 
 case _: FileNotFoundException =>
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
index c2f34fc..f3beb35 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
@@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with 
Matchers with Logging {
 }
   }
 
+  test("SPARK-33146: don't let one bad rolling log folder prevent loading 
other applications") {
+withTempDir { dir =>
+  val conf = createTestConf(true)
+  conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath)
+  val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
+  val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
+
+  val provider = new FsHistoryProvider(conf)
+
+  val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, 
conf, hadoopConf)
+  writer.start()
+
+  writeEventsToRollingWriter(writer, Seq(
+SparkListenerApplicationStart("app", Some("app"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(dir.listFiles().size === 1)
+  assert(provider.getListing.length === 1)
+
+  // Manually delete the appstatus file to make an invalid rolling event 
log
+  val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new 
Path(writer.logPath),
+"app", None, true)
+  fs.delete(appStatusPath, false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 0)
+
+  // Create a new application
+  val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, 
conf, hadoopConf)
+  writer2.start()
+  writeEventsToRollingWriter(writer2, Seq(
+SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+
+  // Both folders exist but only one application found
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 1)
+  assert(dir.listFiles

[spark] branch master updated (f3ad32f -> 9ab0ec4)

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3ad32f  [SPARK-33026][SQL][FOLLOWUP] metrics name should be 
numOutputRows
 add 9ab0ec4  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d9669bd  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS
d9669bd is described below

commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5
Author: Adam Binford 
AuthorDate: Thu Oct 15 11:59:29 2020 +0900

[SPARK-33146][CORE] Check for non-fatal errors when loading new 
applications in SHS

### What changes were proposed in this pull request?

Adds an additional check for non-fatal errors when attempting to add a new 
entry to the history server application listing.

### Why are the changes needed?

A bad rolling event log folder (missing appstatus file or no log files) 
would cause no applications to be loaded by the Spark history server. Figuring 
out why invalid event log folders are created in the first place will be 
addressed in separate issues, this just lets the history server skip the 
invalid folder and successfully load all the valid applications.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UT

Closes #30037 from Kimahriman/bug/rolling-log-crashing-history.

Authored-by: Adam Binford 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
index c262152..5970708 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
@@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
 reader.fileSizeForLastIndex > 0
   } catch {
 case _: FileNotFoundException => false
+case NonFatal(e) =>
+  logWarning(s"Error while reading new log 
${reader.rootPath}", e)
+  false
   }
 
 case _: FileNotFoundException =>
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
index c2f34fc..f3beb35 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
@@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with 
Matchers with Logging {
 }
   }
 
+  test("SPARK-33146: don't let one bad rolling log folder prevent loading 
other applications") {
+withTempDir { dir =>
+  val conf = createTestConf(true)
+  conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath)
+  val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
+  val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
+
+  val provider = new FsHistoryProvider(conf)
+
+  val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, 
conf, hadoopConf)
+  writer.start()
+
+  writeEventsToRollingWriter(writer, Seq(
+SparkListenerApplicationStart("app", Some("app"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(dir.listFiles().size === 1)
+  assert(provider.getListing.length === 1)
+
+  // Manually delete the appstatus file to make an invalid rolling event 
log
+  val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new 
Path(writer.logPath),
+"app", None, true)
+  fs.delete(appStatusPath, false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 0)
+
+  // Create a new application
+  val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, 
conf, hadoopConf)
+  writer2.start()
+  writeEventsToRollingWriter(writer2, Seq(
+SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+
+  // Both folders exist but only one application found
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 1)
+  assert(dir.listFiles

[spark] branch master updated (f3ad32f -> 9ab0ec4)

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3ad32f  [SPARK-33026][SQL][FOLLOWUP] metrics name should be 
numOutputRows
 add 9ab0ec4  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d9669bd  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS
d9669bd is described below

commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5
Author: Adam Binford 
AuthorDate: Thu Oct 15 11:59:29 2020 +0900

[SPARK-33146][CORE] Check for non-fatal errors when loading new 
applications in SHS

### What changes were proposed in this pull request?

Adds an additional check for non-fatal errors when attempting to add a new 
entry to the history server application listing.

### Why are the changes needed?

A bad rolling event log folder (missing appstatus file or no log files) 
would cause no applications to be loaded by the Spark history server. Figuring 
out why invalid event log folders are created in the first place will be 
addressed in separate issues, this just lets the history server skip the 
invalid folder and successfully load all the valid applications.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UT

Closes #30037 from Kimahriman/bug/rolling-log-crashing-history.

Authored-by: Adam Binford 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
index c262152..5970708 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
@@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
 reader.fileSizeForLastIndex > 0
   } catch {
 case _: FileNotFoundException => false
+case NonFatal(e) =>
+  logWarning(s"Error while reading new log 
${reader.rootPath}", e)
+  false
   }
 
 case _: FileNotFoundException =>
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
index c2f34fc..f3beb35 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
@@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with 
Matchers with Logging {
 }
   }
 
+  test("SPARK-33146: don't let one bad rolling log folder prevent loading 
other applications") {
+withTempDir { dir =>
+  val conf = createTestConf(true)
+  conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath)
+  val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
+  val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
+
+  val provider = new FsHistoryProvider(conf)
+
+  val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, 
conf, hadoopConf)
+  writer.start()
+
+  writeEventsToRollingWriter(writer, Seq(
+SparkListenerApplicationStart("app", Some("app"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(dir.listFiles().size === 1)
+  assert(provider.getListing.length === 1)
+
+  // Manually delete the appstatus file to make an invalid rolling event 
log
+  val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new 
Path(writer.logPath),
+"app", None, true)
+  fs.delete(appStatusPath, false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 0)
+
+  // Create a new application
+  val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, 
conf, hadoopConf)
+  writer2.start()
+  writeEventsToRollingWriter(writer2, Seq(
+SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+
+  // Both folders exist but only one application found
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 1)
+  assert(dir.listFiles

[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS

2020-10-14 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d9669bd  [SPARK-33146][CORE] Check for non-fatal errors when loading 
new applications in SHS
d9669bd is described below

commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5
Author: Adam Binford 
AuthorDate: Thu Oct 15 11:59:29 2020 +0900

[SPARK-33146][CORE] Check for non-fatal errors when loading new 
applications in SHS

### What changes were proposed in this pull request?

Adds an additional check for non-fatal errors when attempting to add a new 
entry to the history server application listing.

### Why are the changes needed?

A bad rolling event log folder (missing appstatus file or no log files) 
would cause no applications to be loaded by the Spark history server. Figuring 
out why invalid event log folders are created in the first place will be 
addressed in separate issues, this just lets the history server skip the 
invalid folder and successfully load all the valid applications.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UT

Closes #30037 from Kimahriman/bug/rolling-log-crashing-history.

Authored-by: Adam Binford 
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
(cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90)
Signed-off-by: Jungtaek Lim (HeartSaVioR) 
---
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 ++
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 52 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
index c262152..5970708 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
@@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
 reader.fileSizeForLastIndex > 0
   } catch {
 case _: FileNotFoundException => false
+case NonFatal(e) =>
+  logWarning(s"Error while reading new log 
${reader.rootPath}", e)
+  false
   }
 
 case _: FileNotFoundException =>
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
index c2f34fc..f3beb35 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
@@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with 
Matchers with Logging {
 }
   }
 
+  test("SPARK-33146: don't let one bad rolling log folder prevent loading 
other applications") {
+withTempDir { dir =>
+  val conf = createTestConf(true)
+  conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath)
+  val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
+  val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
+
+  val provider = new FsHistoryProvider(conf)
+
+  val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, 
conf, hadoopConf)
+  writer.start()
+
+  writeEventsToRollingWriter(writer, Seq(
+SparkListenerApplicationStart("app", Some("app"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(dir.listFiles().size === 1)
+  assert(provider.getListing.length === 1)
+
+  // Manually delete the appstatus file to make an invalid rolling event 
log
+  val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new 
Path(writer.logPath),
+"app", None, true)
+  fs.delete(appStatusPath, false)
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 0)
+
+  // Create a new application
+  val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, 
conf, hadoopConf)
+  writer2.start()
+  writeEventsToRollingWriter(writer2, Seq(
+SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None),
+SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false)
+
+  // Both folders exist but only one application found
+  provider.checkForLogs()
+  provider.cleanLogs()
+  assert(provider.getListing.length === 1)
+  assert(dir.listFiles

[spark] branch branch-3.0 updated (0bff1f6 -> 02f80cf)

2020-10-18 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0bff1f6  [SPARK-33123][INFRA] Ignore GitHub only changes in Amplab 
Jenkins build
 new 15ed312  [SPARK-32557][CORE] Logging and swallowing the exception per 
entry in History server
 new 02f80cf  Revert "Revert "[SPARK-33146][CORE] Check for non-fatal 
errors when loading new applications in SHS""

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   |  7 +++-
 .../deploy/history/FsHistoryProviderSuite.scala| 49 ++
 2 files changed, 55 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 >

1 - 100 of 852 matches

Mail list logo