[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16760036
  
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
   // Otherwise, cache the values and keep track of any updates in 
block statuses
   val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
   val cachedValues = putInBlockManager(key, computedValues, 
storageLevel, updatedBlocks)
-  context.taskMetrics.updatedBlocks = Some(updatedBlocks)
+  val metrics = context.taskMetrics
+  val lastUpdatedBlocks = 
metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
+  metrics.updatedBlocks = Some(lastUpdatedBlocks ++ 
updatedBlocks.toSeq)
--- End diff --

I raised this in the other PR and since we have a new one I'll just ask 
again. Can you explain what this fixes? My understanding is that this can only 
be called once per task, and since this is the only place where we set 
`updatedBlocks` I don't see how the original `TaskMetrics` could already have 
updated blocks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16760052
  
--- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala 
---
@@ -70,8 +70,10 @@ class StorageListener(storageStatusListener: 
StorageStatusListener) extends Spar
   }
 
   override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted) = synchronized {
-// Remove all partitions that are no longer cached
-_rddInfoMap.retain { case (_, info) = info.numCachedPartitions  0 }
+// Remove all partitions that are no longer cached in current 
completed stage
+val completedRddInfoIds = Set[Int]() ++ 
stageCompleted.stageInfo.rddInfos.map(r = r.id)
+_rddInfoMap.retain { case (id, info) =
+   !completedRddInfoIds.contains(id) || 
info.numCachedPartitions  0 }
--- End diff --

also, the style should be
```
_rddInfoMap.retain { case (id, info) =
  !completed...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16760122
  
--- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
@@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with 
BeforeAndAfter with EasyMockSugar
   assert(value.toList === List(1, 2, 3, 4))
 }
   }
+
+  test(verity task metrics updated correctly) {
+blockManager = sc.env.blockManager
+cacheManager = new CacheManager(blockManager)
--- End diff --

is there a reason why you're not using `sc.env.cacheManager`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16760352
  
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
   // Otherwise, cache the values and keep track of any updates in 
block statuses
   val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
   val cachedValues = putInBlockManager(key, computedValues, 
storageLevel, updatedBlocks)
-  context.taskMetrics.updatedBlocks = Some(updatedBlocks)
+  val metrics = context.taskMetrics
+  val lastUpdatedBlocks = 
metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
+  metrics.updatedBlocks = Some(lastUpdatedBlocks ++ 
updatedBlocks.toSeq)
--- End diff --

A single task could call `getOrCompute` multiple times in a chain - if it 
is computing several pipelined RDD's.

```
a.cache().filter(---).cache().count
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16760497
  
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
   // Otherwise, cache the values and keep track of any updates in 
block statuses
   val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
   val cachedValues = putInBlockManager(key, computedValues, 
storageLevel, updatedBlocks)
-  context.taskMetrics.updatedBlocks = Some(updatedBlocks)
+  val metrics = context.taskMetrics
+  val lastUpdatedBlocks = 
metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
+  metrics.updatedBlocks = Some(lastUpdatedBlocks ++ 
updatedBlocks.toSeq)
--- End diff --

Ah I see. Makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16760627
  
--- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
@@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with 
BeforeAndAfter with EasyMockSugar
   assert(value.toList === List(1, 2, 3, 4))
 }
   }
+
+  test(verity task metrics updated correctly) {
+blockManager = sc.env.blockManager
+cacheManager = new CacheManager(blockManager)
+val context = new TaskContext(0, 0, 0)
+cacheManager.getOrCompute(rdd3, split, context, 
StorageLevel.MEMORY_ONLY)
+assert(context.taskMetrics.updatedBlocks.getOrElse(Seq()).size == 2)
--- End diff --

Can you use `===` here instead of `==`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16761022
  
--- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala 
---
@@ -70,8 +70,10 @@ class StorageListener(storageStatusListener: 
StorageStatusListener) extends Spar
   }
 
   override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted) = synchronized {
-// Remove all partitions that are no longer cached
-_rddInfoMap.retain { case (_, info) = info.numCachedPartitions  0 }
+// Remove all partitions that are no longer cached in current 
completed stage
+val completedRddInfoIds = Set[Int]() ++ 
stageCompleted.stageInfo.rddInfos.map(r = r.id)
--- End diff --

Also, I would just call this `completedRddIds`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16761636
  
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
   // Otherwise, cache the values and keep track of any updates in 
block statuses
   val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
   val cachedValues = putInBlockManager(key, computedValues, 
storageLevel, updatedBlocks)
-  context.taskMetrics.updatedBlocks = Some(updatedBlocks)
+  val metrics = context.taskMetrics
+  val lastUpdatedBlocks = 
metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
+  metrics.updatedBlocks = Some(lastUpdatedBlocks ++ 
updatedBlocks.toSeq)
--- End diff --

@andrewor14 IMHO, the getOrCompute can be called more than once per task 
(indirect recursively). In this code snippet: 

   val rdd1 = sc.parallelize(...).cache()
   val rdd2 = rdd1.map(...).cache()
   val count = rdd2.count()

This code snippet will submit one stage . We take task-1 as an example. 
Task-1 firstly calls getOrCompute(rdd-2) , and then calls getOrCompute(rdd-1) 
inside getOrCompute(rdd-2). Therefore, it will generates and caches block 
rdd-1-1 and  block rdd-2-1 one by one. At the end of getOrCompute(rdd-1), the 
taskMetrics.updatedBlocks of task-1 will be seq(rdd-1-1). Then at the end of 
getOrCompute(rdd-2), the taskMetrics.updatedBlocks will be seq(rdd-1-1, 
rdd-2-1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16762096
  
--- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
@@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with 
BeforeAndAfter with EasyMockSugar
   assert(value.toList === List(1, 2, 3, 4))
 }
   }
+
+  test(verity task metrics updated correctly) {
+blockManager = sc.env.blockManager
+cacheManager = new CacheManager(blockManager)
+val context = new TaskContext(0, 0, 0)
+cacheManager.getOrCompute(rdd3, split, context, 
StorageLevel.MEMORY_ONLY)
+assert(context.taskMetrics.updatedBlocks.getOrElse(Seq()).size == 2)
--- End diff --

sorry for my poor coding, I will review again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53549872
  
@andrewor14 sorry for my poor coding. Unit test passed locally, test it 
again pls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53550329
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19308/consoleFull)
 for   PR 2131 at commit 
[`a6a8a0b`](https://github.com/apache/spark/commit/a6a8a0b4a97893a9c47645e0858885192cb41abf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53554871
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19308/consoleFull)
 for   PR 2131 at commit 
[`a6a8a0b`](https://github.com/apache/spark/commit/a6a8a0b4a97893a9c47645e0858885192cb41abf).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53610224
  
Thanks, I have merged this into master and branch-1.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2131


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/2131

[SPARK-3170][CORE][BUG]:RDD info loss in StorageTab and ExecutorTab

compeleted stage only need to remove its own partitions that are no longer 
cached. However, StorageTab may lost some rdds which are cached actually. Not 
only in StorageTab, ExectutorTab may also lose some rdd info which have 
been overwritten by last rdd in a same task.
1. StorageTab: when multiple stages run simultaneously, completed stage 
will remove rdd info which belong to other stages that are still running.
2. ExectutorTab: taskcontext may lose some updatedBlocks info of  rdds  
in a dependency chain. Like the following example:
 val r1 = sc.paralize(..).cache()
 val r2 = r1.map(...).cache()
 val n = r2.count()

When count the r2, r1 and r2 will be cached finally. So in 
CacheManager.getOrCompute, the taskcontext should contain updatedBlocks of r1 
and r2. Currently, the updatedBlocks only contain the info of r2. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master_ui_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2131


commit c82ba82ae90c92244e63811f30e1aeb05608c57a
Author: uncleGen husty...@gmail.com
Date:   2014-08-26T06:54:04Z

Bug Fix: RDD info loss in StorageTab and ExecutorTab




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53383213
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53454018
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53455017
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19216/consoleFull)
 for   PR 2131 at commit 
[`c82ba82`](https://github.com/apache/spark/commit/c82ba82ae90c92244e63811f30e1aeb05608c57a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53455192
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19216/consoleFull)
 for   PR 2131 at commit 
[`c82ba82`](https://github.com/apache/spark/commit/c82ba82ae90c92244e63811f30e1aeb05608c57a).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53455331
  
Hi @uncleGen there's another line too long...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53521662
  
Hi @andrewor14, test it again please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53525012
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53525126
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19284/consoleFull)
 for   PR 2131 at commit 
[`56ea488`](https://github.com/apache/spark/commit/56ea48848fe023a2224692bd348cb5f1fa25d718).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53527156
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19284/consoleFull)
 for   PR 2131 at commit 
[`56ea488`](https://github.com/apache/spark/commit/56ea48848fe023a2224692bd348cb5f1fa25d718).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16758468
  
--- Diff: 
core/src/test/scala/org/apache/spark/ui/storage/StorageTabSuite.scala ---
@@ -162,4 +163,29 @@ class StorageTabSuite extends FunSuite with 
BeforeAndAfter {
 assert(storageListener._rddInfoMap(2).numCachedPartitions === 0)
   }
 
+  test(verity StorageTab contains all cached rdds) {
--- End diff --

verify


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53528924
  
Looks like this failed one of your own tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16758476
  
--- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala 
---
@@ -70,8 +70,10 @@ class StorageListener(storageStatusListener: 
StorageStatusListener) extends Spar
   }
 
   override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted) = synchronized {
-// Remove all partitions that are no longer cached
-_rddInfoMap.retain { case (_, info) = info.numCachedPartitions  0 }
+// Remove all partitions that are no longer cached in current 
completed stage
+val completedRddInfoIds = Set[Int]() ++ 
stageCompleted.stageInfo.rddInfos.map(r = r.id)
--- End diff --

Why not just use `toSet`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16758471
  
--- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala 
---
@@ -70,8 +70,10 @@ class StorageListener(storageStatusListener: 
StorageStatusListener) extends Spar
   }
 
   override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted) = synchronized {
-// Remove all partitions that are no longer cached
-_rddInfoMap.retain { case (_, info) = info.numCachedPartitions  0 }
+// Remove all partitions that are no longer cached in current 
completed stage
+val completedRddInfoIds = Set[Int]() ++ 
stageCompleted.stageInfo.rddInfos.map(r = r.id)
+_rddInfoMap.retain { case (id, info) =
+   !completedRddInfoIds.contains(id) || 
info.numCachedPartitions  0 }
--- End diff --

indentation is weird


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16758470
  
--- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
@@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with 
BeforeAndAfter with EasyMockSugar
   assert(value.toList === List(1, 2, 3, 4))
 }
   }
+
+  test(verity task metrics updated correctly) {
--- End diff --

verify


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org