[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2016-08-12 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/14617

[SPARK-17019][Core] Expose on-heap and off-heap memory usage various places

## What changes were proposed in this pull request?

With [SPARK-13992](https://issues.apache.org/jira/browse/SPARK-13992), 
Spark supports persisting data into off-heap memory, but the usage of on-heap 
and off-heap memory is not exposed currently, it is not so convenient for user 
to monitor and profile, so here propose to expose off-heap memory as well as 
on-heap memory usage in various places:

1. Spark UI's executor page will display both on-heap and off-heap memory 
usage.
2. REST request returns both on-heap and off-heap memory.
3. Also this can be gotten from MetricsSystem.
4. Last this usage can be obtained programmatically from SparkListener.

Attach the UI changes:

![screen shot 2016-08-12 at 11 20 44 
am](https://cloud.githubusercontent.com/assets/850797/17612032/6c2f4480-607f-11e6-82e8-a27fb8cbb4ae.png)

Backward compatibility is also considered for event-log and REST API. Old 
event log can still be replayed with off-heap usage displayed as 0. For REST 
API, only adds the new fields, so JSON backward compatibility can still be kept.

## How was this patch tested?

Unit test added and manual verification.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-17019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14617.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14617


commit a5e9d46aedf6de47f1e93de0afd8b5d913f2f36e
Author: jerryshao 
Date:   2016-08-12T02:51:12Z

Expose on-heap and off-heap memory usage in UI, REST API and SparkListener




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-16 Thread jsoltren
Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106469513
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala 
---
@@ -116,7 +121,11 @@ private[spark] object ExecutorsPage {
   totalShuffleRead,
   totalShuffleWrite,
   maxMem,
-  executorLogs
+  executorLogs,
+  onHeapMemUsed,
+  offHeapMemUsed,
+  maxOnHeapMem,
+  maxOffHeapMem
--- End diff --

This all looks correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-16 Thread jsoltren
Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106467583
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -174,8 +174,10 @@ $(document).ready(function () {
 var summary = [];
 var allExecCnt = 0;
 var allRDDBlocks = 0;
-var allMemoryUsed = 0;
-var allMaxMemory = 0;
+var allOnHeapMemoryUsed = 0;
+var allOnHeapMaxMemory = 0;
+var allOffHeapMemoryUsed = 0;
+var allOffHeapMaxMemory = 0;
--- End diff --

This looks correct, and the order agrees with executorspage-template.html.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-16 Thread jsoltren
Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106467285
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage-template.html 
---
@@ -24,7 +24,10 @@ Summary
 
 RDD Blocks
 Storage Memory
+  title="Memory used / total available memory for on heap 
storage of data like RDD partitions cached in memory. "> On Heap Storage 
Memory
--- End diff --

Nit: trailing space. Should me:
memory.">


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-16 Thread jsoltren
Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106467368
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage-template.html 
---
@@ -24,7 +24,10 @@ Summary
 
 RDD Blocks
 Storage Memory
+  title="Memory used / total available memory for on heap 
storage of data like RDD partitions cached in memory. "> On Heap Storage 
Memory
+
+ Off Heap Storage 
Memory
--- End diff --

Same nit for trailing space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-16 Thread jsoltren
Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106469348
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -60,11 +63,17 @@ class StorageStatus(val blockManagerId: BlockManagerId, 
val maxMem: Long) {
* non-RDD blocks contains only the first 3 fields (in the same order).
*/
   private val _rddStorageInfo = new mutable.HashMap[Int, (Long, Long, 
StorageLevel)]
-  private var _nonRddStorageInfo: (Long, Long) = (0L, 0L)
+
+  // On-heap memory, off-heap memory and disk usage of non rdd storage
+  private var _nonRddStorageInfo: (Long, Long, Long) = (0L, 0L, 0L)
--- End diff --

Another preference, I'd rather see this as perhaps a case class so you 
could do something like:
_nonRddStorageInfo(onHeap: Long, offHeap: Long, disk: Long)

That way you don't need to do things like _1 and _2 below which, IMO, make 
the code less readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-16 Thread jsoltren
Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106470033
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -87,8 +87,13 @@ case class 
SparkListenerEnvironmentUpdate(environmentDetails: Map[String, Seq[(S
   extends SparkListenerEvent
 
 @DeveloperApi
-case class SparkListenerBlockManagerAdded(time: Long, blockManagerId: 
BlockManagerId, maxMem: Long)
-  extends SparkListenerEvent
+case class SparkListenerBlockManagerAdded(
+time: Long,
+blockManagerId: BlockManagerId,
+maxMem: Long,
+maxOnHeapMem: Option[Long] = None,
+maxOffHeapMem: Option[Long] = None) extends SparkListenerEvent {
+}
--- End diff --

I assume that, with this, we get JSON serialization of 
SparkListenerBlockManagerAdded events for free? It's important that an external 
history server be able to see the new maxOnHeapMem and maxOffHeapMem values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-16 Thread jsoltren
Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106468328
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala ---
@@ -26,35 +26,39 @@ private[spark] class BlockManagerSource(val 
blockManager: BlockManager)
   override val metricRegistry = new MetricRegistry()
   override val sourceName = "BlockManager"
 
-  metricRegistry.register(MetricRegistry.name("memory", "maxMem_MB"), new 
Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val maxMem = storageStatusList.map(_.maxMem).sum
-  maxMem / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("memory", 
"remainingMem_MB"), new Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val remainingMem = storageStatusList.map(_.memRemaining).sum
-  remainingMem / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("memory", "memUsed_MB"), new 
Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val memUsed = storageStatusList.map(_.memUsed).sum
-  memUsed / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("disk", "diskSpaceUsed_MB"), 
new Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val diskSpaceUsed = storageStatusList.map(_.diskUsed).sum
-  diskSpaceUsed / 1024 / 1024
-}
-  })
+  private def registerGauge[T](name: String, f: BlockManagerMaster => T): 
Unit = {
+metricRegistry.register(name, new Gauge[T] {
+  override def getValue: T = f(blockManager.master)
+})
+  }
+
+  registerGauge(MetricRegistry.name("memory", "maxMem_MB"),
+_.getStorageStatus.map(_.maxMem).sum / 1024 / 1024)
--- End diff --

Not related, but this change is the right time to get rid of "/1024/1024" 
magic number arithmetic. Ideally you could just do something like 
sum.toMegabytes, which pulls the arithmetic definition somewhere else. This is 
sprinkled through enough times that I'd like to see it encapsulated in one 
place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-16 Thread jsoltren
Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106467740
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -350,7 +366,12 @@ $(document).ready(function () {
 {data: 'rddBlocks'},
 {
 data: function (row, type) {
-return type === 'display' ? 
(formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) 
: row.memoryUsed;
+return type === 'display' ? 
(formatBytes(row.onHeapMemoryUsed, type) + ' / ' + 
formatBytes(row.maxOnHeapMemory, type)) : row.maxOnHeapMemory
+}
+},
+{
+data: function (row, type) {
+return type === 'display' ? 
(formatBytes(row.offHeapMemoryUsed, type) + ' / ' + 
formatBytes(row.maxOffHeapMemory, type)) : row.maxOffHeapMemory
--- End diff --

This all looks correct, including the field names and using formatBytes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-19 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106838695
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala ---
@@ -26,35 +26,39 @@ private[spark] class BlockManagerSource(val 
blockManager: BlockManager)
   override val metricRegistry = new MetricRegistry()
   override val sourceName = "BlockManager"
 
-  metricRegistry.register(MetricRegistry.name("memory", "maxMem_MB"), new 
Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val maxMem = storageStatusList.map(_.maxMem).sum
-  maxMem / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("memory", 
"remainingMem_MB"), new Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val remainingMem = storageStatusList.map(_.memRemaining).sum
-  remainingMem / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("memory", "memUsed_MB"), new 
Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val memUsed = storageStatusList.map(_.memUsed).sum
-  memUsed / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("disk", "diskSpaceUsed_MB"), 
new Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val diskSpaceUsed = storageStatusList.map(_.diskUsed).sum
-  diskSpaceUsed / 1024 / 1024
-}
-  })
+  private def registerGauge[T](name: String, f: BlockManagerMaster => T): 
Unit = {
+metricRegistry.register(name, new Gauge[T] {
+  override def getValue: T = f(blockManager.master)
+})
+  }
+
+  registerGauge(MetricRegistry.name("memory", "maxMem_MB"),
+_.getStorageStatus.map(_.maxMem).sum / 1024 / 1024)
--- End diff --

Thanks, I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-20 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106882606
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -87,8 +87,13 @@ case class 
SparkListenerEnvironmentUpdate(environmentDetails: Map[String, Seq[(S
   extends SparkListenerEvent
 
 @DeveloperApi
-case class SparkListenerBlockManagerAdded(time: Long, blockManagerId: 
BlockManagerId, maxMem: Long)
-  extends SparkListenerEvent
+case class SparkListenerBlockManagerAdded(
+time: Long,
+blockManagerId: BlockManagerId,
+maxMem: Long,
+maxOnHeapMem: Option[Long] = None,
+maxOffHeapMem: Option[Long] = None) extends SparkListenerEvent {
+}
--- End diff --

Yes, I think so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-20 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106930720
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -172,6 +172,15 @@ function totalDurationColor(totalGCTime, 
totalDuration) {
 return (totalGCTime > GCTimePercent * totalDuration) ? "white" : 
"black";
 }
 
+function memoryUsageTooltip(onHeap, maxOnHeap, offHeap, maxOffHeap, 
onHeapSum, offHeapSum, type) {
+return ("" +
+formatBytes(onHeapSum, type) + " / " + formatBytes(offHeapSum, 
type) + "");
--- End diff --

this is pretty hard to read (at least for me, though I don't really know js 
...).  Maybe refactor so you prepare two tmp variables, one with the tooltip 
text, and another one w/ the normal text, and then return the fully assembled 
string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-20 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106935786
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -176,17 +185,42 @@ class StorageStatus(val blockManagerId: 
BlockManagerId, val maxMem: Long) {
*/
   def numRddBlocksById(rddId: Int): Int = 
_rddBlocks.get(rddId).map(_.size).getOrElse(0)
 
+  /** Return the max memory can be used by this block manager. */
+  def maxMem: Long = maxOnHeapMem + maxOffHeapMem
+
   /** Return the memory remaining in this block manager. */
-  def memRemaining: Long = maxMem - memUsed
+  def memRemaining: Long = onHeapMemRemaining + offHeapMemRemaining
+
+  /** Return the memory used by caching RDDs */
+  def cacheSize: Long = onHeapCacheSize + offHeapCacheSize
 
   /** Return the memory used by this block manager. */
-  def memUsed: Long = _nonRddStorageInfo._1 + cacheSize
+  def memUsed: Long = onHeapMemUsed + offHeapMemUsed
 
-  /** Return the memory used by caching RDDs */
-  def cacheSize: Long = _rddBlocks.keys.toSeq.map(memUsedByRdd).sum
+  /** Return the on-heap memory remaining in this block manager. */
+  def onHeapMemRemaining: Long = maxOnHeapMem - onHeapMemUsed
+
+  /** Return the off-heap memory remaining in this block manager. */
+  def offHeapMemRemaining: Long = maxOffHeapMem - offHeapMemUsed
+
+  /** Return the on-heap memory used by this block manager. */
+  def onHeapMemUsed: Long = _nonRddStorageInfo._1 + onHeapCacheSize
+
+  /** Return the off-heap memory used by this block manager. */
+  def offHeapMemUsed: Long = _nonRddStorageInfo._2 + offHeapCacheSize
+
+  /** Return the memory used by on-heap caching RDDs */
+  def onHeapCacheSize: Long = {
+_rddStorageInfo.filter(!_._2._3.useOffHeap).map(_._2._1).sum
--- End diff --

You can combine the filter & map with `collect`.  I think its a bit clearer 
and also avoids the extra intermediate structure.

```scala
_rddStorageInfo.collect {
  case (_, (memoryUsed, _, storageStatus)) if !storageStatus.useOffHeap => 
memoryUsed
}.sum
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-20 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106930443
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -172,6 +172,15 @@ function totalDurationColor(totalGCTime, 
totalDuration) {
 return (totalGCTime > GCTimePercent * totalDuration) ? "white" : 
"black";
 }
 
+function memoryUsageTooltip(onHeap, maxOnHeap, offHeap, maxOffHeap, 
onHeapSum, offHeapSum, type) {
--- End diff --

`onHeapSum` and `offHeapSum` mean something completely different, right?  
aren't they more like `totalMemoryUsed` and `totalMemoryAvailable`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-20 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r106931457
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -180,7 +180,8 @@ private[spark] class BlockManager(
 
 val idFromMaster = master.registerBlockManager(
   id,
-  maxMemory,
+  maxOffHeapMemory,
+  maxOffHeapMemory,
--- End diff --

one of these should be on heap


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107559155
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala ---
@@ -26,35 +26,39 @@ private[spark] class BlockManagerSource(val 
blockManager: BlockManager)
   override val metricRegistry = new MetricRegistry()
   override val sourceName = "BlockManager"
 
-  metricRegistry.register(MetricRegistry.name("memory", "maxMem_MB"), new 
Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val maxMem = storageStatusList.map(_.maxMem).sum
-  maxMem / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("memory", 
"remainingMem_MB"), new Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val remainingMem = storageStatusList.map(_.memRemaining).sum
-  remainingMem / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("memory", "memUsed_MB"), new 
Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val memUsed = storageStatusList.map(_.memUsed).sum
-  memUsed / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("disk", "diskSpaceUsed_MB"), 
new Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val diskSpaceUsed = storageStatusList.map(_.diskUsed).sum
-  diskSpaceUsed / 1024 / 1024
-}
-  })
+  private def registerGauge(name: String, f: BlockManagerMaster => Long): 
Unit = {
+metricRegistry.register(name, new Gauge[Long] {
+  override def getValue: Long = f(blockManager.master) / 1024 / 1024
--- End diff --

Nothing wrong here, but using `f` does lower readability, took me a few 
reads to figure out what value `f` returned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107557907
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -378,7 +394,37 @@ $(document).ready(function () {
 {data: 'rddBlocks'},
 {
 data: function (row, type) {
-return type === 'display' ? 
(formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) 
: row.memoryUsed;
+if (type !== 'display')
+return row.maxOnHeapMemory + 
row.maxOffHeapMemory;
--- End diff --

I don't think you meant to use the `max*` vars here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107557948
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -378,7 +394,37 @@ $(document).ready(function () {
 {data: 'rddBlocks'},
 {
 data: function (row, type) {
-return type === 'display' ? 
(formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) 
: row.memoryUsed;
+if (type !== 'display')
+return row.maxOnHeapMemory + 
row.maxOffHeapMemory;
+else
+var memoryUsed = row.onHeapMemoryUsed 
+ row.offHeapMemoryUsed;
+var maxMemory = row.maxOnHeapMemory + 
row.maxOffHeapMemory;
+return (formatBytes(memoryUsed, type) 
+ ' / ' +
+formatBytes(maxMemory, type));
+}
+},
+{
+data: function (row, type) {
+if (type !== 'display')
+return row.maxOnHeapMemory;
+else
+return 
(formatBytes(row.onHeapMemoryUsed, type) + ' / ' +
+formatBytes(row.maxOnHeapMemory, 
type));
+},
+"fnCreatedCell": function (nTd, sData, oData, 
iRow, iCol) {
+$(nTd).addClass('on_heap_memory')
+}
+},
+{
+data: function (row, type) {
+if (type !== 'display')
+return row.maxOffHeapMemory;
--- End diff --

and here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107557914
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -378,7 +394,37 @@ $(document).ready(function () {
 {data: 'rddBlocks'},
 {
 data: function (row, type) {
-return type === 'display' ? 
(formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) 
: row.memoryUsed;
+if (type !== 'display')
+return row.maxOnHeapMemory + 
row.maxOffHeapMemory;
+else
+var memoryUsed = row.onHeapMemoryUsed 
+ row.offHeapMemoryUsed;
+var maxMemory = row.maxOnHeapMemory + 
row.maxOffHeapMemory;
+return (formatBytes(memoryUsed, type) 
+ ' / ' +
+formatBytes(maxMemory, type));
+}
+},
+{
+data: function (row, type) {
+if (type !== 'display')
+return row.maxOnHeapMemory;
--- End diff --

and here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107559778
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -60,11 +63,17 @@ class StorageStatus(val blockManagerId: BlockManagerId, 
val maxMem: Long) {
* non-RDD blocks contains only the first 3 fields (in the same order).
*/
   private val _rddStorageInfo = new mutable.HashMap[Int, (Long, Long, 
StorageLevel)]
-  private var _nonRddStorageInfo: (Long, Long) = (0L, 0L)
+
+  // On-heap memory, off-heap memory and disk usage of non rdd storage
+  private var _nonRddStorageInfo: (Long, Long, Long) = (0L, 0L, 0L)
--- End diff --

I agree about a case class to improve readability


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-24 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107940526
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -47,24 +50,30 @@ class StorageStatus(val blockManagerId: BlockManagerId, 
val maxMem: Long) {
   private val _nonRddBlocks = new mutable.HashMap[BlockId, BlockStatus]
 
   /**
-   * Storage information of the blocks that entails memory, disk, and 
off-heap memory usage.
+   * Storage information of the blocks that entails on-heap memory, 
off-heap memory and disk usage.
*
* As with the block maps, we store the storage information separately 
for RDD blocks and
* non-RDD blocks for the same reason. In particular, RDD storage 
information is stored
-   * in a map indexed by the RDD ID to the following 4-tuple:
+   * in a map indexed by the RDD ID to the following 3-tuple case class:
*
*   (memory size, disk size, storage level)
*
* We assume that all the blocks that belong to the same RDD have the 
same storage level.
-   * This field is not relevant to non-RDD blocks, however, so the storage 
information for
-   * non-RDD blocks contains only the first 3 fields (in the same order).
*/
-  private val _rddStorageInfo = new mutable.HashMap[Int, (Long, Long, 
StorageLevel)]
-  private var _nonRddStorageInfo: (Long, Long) = (0L, 0L)
+  case class RddStorageInfo(memoryUsage: Long, diskUsage: Long, level: 
StorageLevel)
--- End diff --

`private`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-24 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107939634
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -47,24 +50,30 @@ class StorageStatus(val blockManagerId: BlockManagerId, 
val maxMem: Long) {
   private val _nonRddBlocks = new mutable.HashMap[BlockId, BlockStatus]
 
   /**
-   * Storage information of the blocks that entails memory, disk, and 
off-heap memory usage.
+   * Storage information of the blocks that entails on-heap memory, 
off-heap memory and disk usage.
*
* As with the block maps, we store the storage information separately 
for RDD blocks and
* non-RDD blocks for the same reason. In particular, RDD storage 
information is stored
-   * in a map indexed by the RDD ID to the following 4-tuple:
+   * in a map indexed by the RDD ID to the following 3-tuple case class:
*
*   (memory size, disk size, storage level)
*
* We assume that all the blocks that belong to the same RDD have the 
same storage level.
-   * This field is not relevant to non-RDD blocks, however, so the storage 
information for
-   * non-RDD blocks contains only the first 3 fields (in the same order).
*/
-  private val _rddStorageInfo = new mutable.HashMap[Int, (Long, Long, 
StorageLevel)]
-  private var _nonRddStorageInfo: (Long, Long) = (0L, 0L)
+  case class RddStorageInfo(memoryUsage: Long, diskUsage: Long, level: 
StorageLevel)
+  private val _rddStorageInfo = new mutable.HashMap[Int, RddStorageInfo]
+
+  // On-heap memory, off-heap memory and disk usage of non rdd storage
+  case class NonRddStorageInfo(var onHeapUsage: Long, var offHeapUsage: 
Long, var diskUsage: Long)
--- End diff --

`private`

and one super-nit: I find comments that just repeat field or method names 
to be pretty pointless, so I'd delete that comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-24 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107939502
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala ---
@@ -74,8 +74,9 @@ class StorageStatusListener(conf: SparkConf) extends 
SparkListener {
 synchronized {
   val blockManagerId = blockManagerAdded.blockManagerId
   val executorId = blockManagerId.executorId
-  val maxMem = blockManagerAdded.maxMem
-  val storageStatus = new StorageStatus(blockManagerId, maxMem)
+  val maxOnHeapMem = 
blockManagerAdded.maxOnHeapMem.getOrElse(blockManagerAdded.maxMem)
+  val maxOffHeapMem = blockManagerAdded.maxOffHeapMem.getOrElse(0L)
--- End diff --

the only time these fields will be `None` is when reading from old event 
logs, right?  If so, its worth adding that in as a comment above this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-24 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107940464
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -47,24 +50,30 @@ class StorageStatus(val blockManagerId: BlockManagerId, 
val maxMem: Long) {
   private val _nonRddBlocks = new mutable.HashMap[BlockId, BlockStatus]
 
   /**
-   * Storage information of the blocks that entails memory, disk, and 
off-heap memory usage.
+   * Storage information of the blocks that entails on-heap memory, 
off-heap memory and disk usage.
*
* As with the block maps, we store the storage information separately 
for RDD blocks and
* non-RDD blocks for the same reason. In particular, RDD storage 
information is stored
-   * in a map indexed by the RDD ID to the following 4-tuple:
+   * in a map indexed by the RDD ID to the following 3-tuple case class:
*
*   (memory size, disk size, storage level)
--- End diff --

this is out of date now that you have a separate case class.  You could 
just get rid of this comment describing it since the names on the case class 
are pretty clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-30 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109015878
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala ---
@@ -74,8 +74,11 @@ class StorageStatusListener(conf: SparkConf) extends 
SparkListener {
 synchronized {
   val blockManagerId = blockManagerAdded.blockManagerId
   val executorId = blockManagerId.executorId
-  val maxMem = blockManagerAdded.maxMem
-  val storageStatus = new StorageStatus(blockManagerId, maxMem)
+  // This two fields are compatible with old event logs, in which 
there only has max on heap
+  // memory in the event log. So maxOnHeapMem will use maxMem, 
maxOffHeapMem will set to 0.
+  val maxOnHeapMem = 
blockManagerAdded.maxOnHeapMem.getOrElse(blockManagerAdded.maxMem)
--- End diff --

actually wait ... are you sure about this?  it looks to me like in old 
event logs, we actually dont' know the breakdown between onheap and offheap.  
I'm pretty sure the value in old event logs starts from here:


https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L154

so then we have to decide how to represent that old info.  The options are:

a) don't show anything, with an empty string in the display, and the types 
are `Option` even in the rest api
b) show a value of `0` for both of them if they are missing
c) assume all the memory was onheap

this is implementing (c).  I'd prefer (a), if my understanding is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-31 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109128494
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala ---
@@ -74,8 +74,11 @@ class StorageStatusListener(conf: SparkConf) extends 
SparkListener {
 synchronized {
   val blockManagerId = blockManagerAdded.blockManagerId
   val executorId = blockManagerId.executorId
-  val maxMem = blockManagerAdded.maxMem
-  val storageStatus = new StorageStatus(blockManagerId, maxMem)
+  // This two fields are compatible with old event logs, in which 
there only has max on heap
+  // memory in the event log. So maxOnHeapMem will use maxMem, 
maxOffHeapMem will set to 0.
+  val maxOnHeapMem = 
blockManagerAdded.maxOnHeapMem.getOrElse(blockManagerAdded.maxMem)
--- End diff --

This was changed by my last PR about on heap memory. So with that patch max 
memory was a sum of two memories. But if user has an older event log, then this 
max memory only refers to on heap memory.

Let me update the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-31 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109184652
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala ---
@@ -74,8 +74,11 @@ class StorageStatusListener(conf: SparkConf) extends 
SparkListener {
 synchronized {
   val blockManagerId = blockManagerAdded.blockManagerId
   val executorId = blockManagerId.executorId
-  val maxMem = blockManagerAdded.maxMem
-  val storageStatus = new StorageStatus(blockManagerId, maxMem)
+  // This two fields are compatible with old event logs, in which 
there only has max on heap
+  // memory in the event log. So maxOnHeapMem will use maxMem, 
maxOffHeapMem will set to 0.
+  val maxOnHeapMem = 
blockManagerAdded.maxOnHeapMem.getOrElse(blockManagerAdded.maxMem)
--- End diff --

ah I see, thanks for explaining that.  But since that went into spark 2.1, 
then the meaning of that value in old version could be either, right?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-31 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109271875
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala ---
@@ -74,8 +74,11 @@ class StorageStatusListener(conf: SparkConf) extends 
SparkListener {
 synchronized {
   val blockManagerId = blockManagerAdded.blockManagerId
   val executorId = blockManagerId.executorId
-  val maxMem = blockManagerAdded.maxMem
-  val storageStatus = new StorageStatus(blockManagerId, maxMem)
+  // This two fields are compatible with old event logs, in which 
there only has max on heap
+  // memory in the event log. So maxOnHeapMem will use maxMem, 
maxOffHeapMem will set to 0.
+  val maxOnHeapMem = 
blockManagerAdded.maxOnHeapMem.getOrElse(blockManagerAdded.maxMem)
--- End diff --

Yes, right. So I should figure out a way to be compatible with both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-01 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109282643
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala 
---
@@ -276,7 +276,8 @@ class BlockManagerMasterEndpoint(
 
   private def storageStatus: Array[StorageStatus] = {
 blockManagerInfo.map { case (blockManagerId, info) =>
-  new StorageStatus(blockManagerId, info.maxMem, info.blocks.asScala)
+  new StorageStatus(blockManagerId, info.maxMem, 
Some(info.maxOnHeapMem),
+Some(info.maxOffHeapMem), info.blocks.asScala)
--- End diff --

Add one more field `maxMem`. If using old event log to replay the storage 
info, only `maxMem` field is valid. Other two fields (`maxOnHeapMem` and 
`maxOffHeapMem`) will be None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-01 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109282694
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -75,7 +75,11 @@ class ExecutorSummary private[spark](
 val totalShuffleWrite: Long,
 val isBlacklisted: Boolean,
 val maxMemory: Long,
-val executorLogs: Map[String, String])
+val executorLogs: Map[String, String],
+val onHeapMemoryUsed: Option[Long],
+val offHeapMemoryUsed: Option[Long],
+val maxOnHeapMemory: Option[Long],
+val maxOffHeapMemory: Option[Long])
--- End diff --

These fields now change to Option, it will not be printed out for old event 
log.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-01 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109282721
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -111,7 +115,11 @@ class RDDDataDistribution private[spark](
 val address: String,
 val memoryUsed: Long,
 val memoryRemaining: Long,
-val diskUsed: Long)
+val diskUsed: Long,
+val onHeapMemoryUsed: Long,
+val offHeapMemoryUsed: Long,
+val onHeapMemoryRemaining: Long,
+val offHeapMemoryRemaining: Long)
--- End diff --

These fields still use `Long` instead of `Option`, since only Live UI could 
access such REST API, and in the live UI everything is up to date.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-04 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109830395
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala 
---
@@ -115,8 +115,9 @@ private[spark] object ExecutorsPage {
 val rddBlocks = status.numBlocks
 val memUsed = status.memUsed
 val maxMem = status.maxMem
-val onHeapMemUsed = status.onHeapMemUsed
-val offHeapMemUsed = status.offHeapMemUsed
+// Only maxOnHeapMem and maxOffHeapMem are defined these two fields 
are not None.
--- End diff --

I don't understand this comment.  Maybe you mean "status.onHeapMemUsed is 
only valid if maxOnHeapMem is defined"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-04 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109830705
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala 
---
@@ -81,6 +115,11 @@ private[spark] object ExecutorsPage {
 val rddBlocks = status.numBlocks
 val memUsed = status.memUsed
 val maxMem = status.maxMem
+// Only maxOnHeapMem and maxOffHeapMem are defined these two fields 
are not None.
--- End diff --

I don't understand this comment. Maybe you mean "status.onHeapMemUsed is 
only valid if maxOnHeapMem is defined"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-04 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109831142
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -35,7 +35,13 @@ import org.apache.spark.internal.Logging
  * class cannot mutate the source of the information. Accesses are not 
thread-safe.
  */
 @DeveloperApi
-class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) {
+class StorageStatus(
+val blockManagerId: BlockManagerId,
+// Explicitly adding this maxMemory field to handle maxOnHeapMem and 
maxOffHeapMem not
+// existing issue, this happened when trying to replay old event log.
+val maxMemory: Long,
+val maxOnHeapMem: Option[Long],
+val maxOffHeapMem: Option[Long]) {
--- End diff --

I find the comment you have here a little confusing.  I'd get rid of it and 
put in something more like what you have where you create the `StorageStatus` 
in `BlockManagerMasterEndpoint`, something like

The onHeap and offHeap memory are always defined for new applications, but 
they can be missing if we are replaying old event logs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-04 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109830934
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -111,7 +115,11 @@ class RDDDataDistribution private[spark](
 val address: String,
 val memoryUsed: Long,
 val memoryRemaining: Long,
-val diskUsed: Long)
+val diskUsed: Long,
+val onHeapMemoryUsed: Long,
+val offHeapMemoryUsed: Long,
+val onHeapMemoryRemaining: Long,
+val offHeapMemoryRemaining: Long)
--- End diff --

I don't see any reason why the history server *couldn't* show this info.  
Though we would be fine now by having them be defined without Option, I worry 
that will be a problem if we ever change the history server to show this info, 
since it won't always have it available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-04 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109832240
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -176,26 +178,51 @@ class StorageStatus(val blockManagerId: 
BlockManagerId, val maxMem: Long) {
*/
   def numRddBlocksById(rddId: Int): Int = 
_rddBlocks.get(rddId).map(_.size).getOrElse(0)
 
+  /** Return the max memory can be used by this block manager. */
+  def maxMem: Long = maxMemory
+
   /** Return the memory remaining in this block manager. */
   def memRemaining: Long = maxMem - memUsed
 
+  /** Return the memory used by caching RDDs */
+  def cacheSize: Long = onHeapCacheSize + offHeapCacheSize
+
   /** Return the memory used by this block manager. */
-  def memUsed: Long = _nonRddStorageInfo._1 + cacheSize
+  def memUsed: Long = onHeapMemUsed + offHeapMemUsed
 
-  /** Return the memory used by caching RDDs */
-  def cacheSize: Long = _rddBlocks.keys.toSeq.map(memUsedByRdd).sum
+  /** Return the on-heap memory remaining in this block manager. */
+  def onHeapMemRemaining: Option[Long] = maxOnHeapMem.map(_ - 
onHeapMemUsed)
+
+  /** Return the off-heap memory remaining in this block manager. */
+  def offHeapMemRemaining: Option[Long] = maxOffHeapMem.map(_ - 
offHeapMemUsed)
+
+  /** Return the on-heap memory used by this block manager. */
+  def onHeapMemUsed: Long = _nonRddStorageInfo.onHeapUsage + 
onHeapCacheSize
+
+  /** Return the off-heap memory used by this block manager. */
+  def offHeapMemUsed: Long = _nonRddStorageInfo.offHeapUsage + 
offHeapCacheSize
+
+  /** Return the memory used by on-heap caching RDDs */
+  def onHeapCacheSize: Long = _rddStorageInfo.collect {
+  case (_, storageInfo) if !storageInfo.level.useOffHeap => 
storageInfo.memoryUsage
+  }.sum
+
+  /** Return the memory used by off-heap caching RDDs */
+  def offHeapCacheSize: Long = _rddStorageInfo.collect {
--- End diff --

should all methods that return something specific to onheap / offheap 
return an OPtion?  It seems like if it was created from event logs, and the 
SparkListenerBlockManagerAdded even didn't have an onheap / offheap breakdown, 
we can't trust these numbers.

I think it doesn't matter now, in the code you have, since these methods 
won't be called unless you know you have an onheap / offheap breakdown 
available.  But I think it would be a bit more clear if you made that explicit 
with the return types of these functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-05 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r109846490
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala 
---
@@ -115,8 +115,9 @@ private[spark] object ExecutorsPage {
 val rddBlocks = status.numBlocks
 val memUsed = status.memUsed
 val maxMem = status.maxMem
-val onHeapMemUsed = status.onHeapMemUsed
-val offHeapMemUsed = status.offHeapMemUsed
+// Only maxOnHeapMem and maxOffHeapMem are defined these two fields 
are not None.
--- End diff --

Yes, sorry about the confuse. Let me clarify the comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-04-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14617


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org