[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-28 Thread ilganeli
Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-76561166
  
Thanks for the detailed write up and the fix. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-28 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-76545044
  
I have a fix at #4835.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r25560891
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -307,11 +310,22 @@ private[spark] object Accumulators {
 }
   }
 
+  def remove(accId : Long) {
+synchronized {
+  originals.remove(accId)  
+}
+  }
+  
   // Get the values of the local accumulators for the current thread (by 
ID)
   def values: Map[Long, Any] = synchronized {
 val ret = Map[Long, Any]()
 for ((id, accum) <- localAccums.get) {
-  ret(id) = accum.localValue
+  // Since we are now storing weak references, we must check whether 
the underlying data
+  // is valid. 
+  ret(id) = accum.get match {
+case Some(values) => values.localValue
+case None => None
--- End diff --

This was dumb for me to overlook, too: if the data structure is invalid, 
then this would just silently ignore it; even if this was a rare 
error-condition, there should have been a warning here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-28 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-76543201
  
> Actually, we won't leak accumulators on executors because we clear the 
thread-local between tasks.

At this point, I should have spotted that it doesn't make sense to store 
WeakReferences if those references are going to be destroyed at the end of 
tasks anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-28 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-76542760
  
I think I've figured it out: consider the lifecycle of an accumulator in a 
task, say ShuffleMapTask: on the executor, each task deserializes its own copy 
of the RDD inside of its `runTask` method, so the strong reference to the RDD 
disappears at the end of `runTask`.  In `Executor.run()`, we call 
`Accumulators.values` after `runTask` has exited, so there's a small window in 
which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well 
because there are no longer any strong references to them.

The fix is to keep strong references in `localAccums`, since we clear this 
at the end of each task anyways.  I'm glad that I was able to figure out 
precisely _why_ this was necessary and sorry that I missed this during review; 
I'll submit a fix shortly.  In terms of preventative measures, it might be a 
good idea to write up the lifetime / lifecycle of objects' strong references 
whenever we're using WeakReferences, since the process of explicitly writing 
that out would prevent these sorts of mistakes in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-27 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-76512378
  
Another thought: if `register()` is somehow called twice for the same 
accumulator, then it looks like we'll silently overwrite the existing value in 
`localAccums`.  We should probably throw an exception instead, since that 
scenario could lead to lost updates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-27 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-76511660
  
I think that this patch may have introduced a bug that may cause 
accumulator updates to be lost:
https://issues.apache.org/jira/browse/SPARK-6075

I'm still trying to see if I can spot the problem, but my hunch is that 
maybe the `localAccums` thread-local maps should not hold weak references.  
When deserializing an accumulator in an executor and registering it with 
`localAccums`, is there ever a moment in which the accumulator has no strong 
references pointing to it?  Does someone object hold a strong reference to an 
accumulator while it's being deserialized?  If not, this could lead to it being 
dropped from the `localAccums` map, causing that task's accumulator updates to 
be lost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-23 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75666737
  
Not yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-23 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r25215797
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -320,7 +334,13 @@ private[spark] object Accumulators {
   def add(values: Map[Long, Any]): Unit = synchronized {
 for ((id, value) <- values) {
   if (originals.contains(id)) {
-originals(id).asInstanceOf[Accumulable[Any, Any]] ++= value
+// Since we are now storing weak references, we must check whether 
the underlying data
+// is valid. 
+originals(id).get match {
+  case Some(accum) => accum.asInstanceOf[Accumulable[Any, Any]] 
++= value
+  case None => 
+throw new IllegalAccessError("Attempted to access garbage 
collected Accumulator.")   
--- End diff --

The exception thrown here is caught at higher levels of the stack.  For 
example, DAGScheduler wraps calls to accumulator methods in a `try` block and 
logs any uncaught exceptions.  Have you run into a case where the current 
behavior causes a problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-23 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r25209770
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -320,7 +334,13 @@ private[spark] object Accumulators {
   def add(values: Map[Long, Any]): Unit = synchronized {
 for ((id, value) <- values) {
   if (originals.contains(id)) {
-originals(id).asInstanceOf[Accumulable[Any, Any]] ++= value
+// Since we are now storing weak references, we must check whether 
the underlying data
+// is valid. 
+originals(id).get match {
+  case Some(accum) => accum.asInstanceOf[Accumulable[Any, Any]] 
++= value
+  case None => 
+throw new IllegalAccessError("Attempted to access garbage 
collected Accumulator.")   
--- End diff --

If Accumulator is garbage collected, should we log and continue ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-22 Thread ilganeli
Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75498643
  
Hi Josh - thanks for the review and suggestions. With regards to the local 
thread cleanup I didn't add anything since as you pointed out it already gets 
cleaned up. Cheers!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-22 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75496699
  
I've merged this into `master` (1.4.0).  Thanks!

@ilganeli, it looks like your text editor might be leaving trailing 
whitespace on lines, which makes the diffs look kind of messy:


![image](https://cloud.githubusercontent.com/assets/50748/6323879/538870e0-bae6-11e4-8544-308eb56946e7.png)

I fixed this myself when committing, so not a huge deal, but you might 
consider changing your editor settings to prevent htis in the future (I have a 
special vim setting that complains about trailing whitespace, and I think 
there's a way to configure IntelliJ to do the same).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4021


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-22 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75495739
  
Actually, we won't leak accumulators on executors because we clear the 
thread-local between tasks.  So this is great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-22 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75495372
  
I just realized that ContextCleaner won't manage cleanup of accumulators in 
the thread-local accumulator registries in executors, but I don't think this is 
a big deal: we're more concerned about the driver OOMing in a long-running app 
than the executors, and it would be prohibitively expensive to add a bunch of 
driver -> executor RPCs just to reclaim one map entry.

Therefore, this looks good to me.  There's one minor import-ordering nit, 
but I'll just fix that up myself on merge.  Thanks for working on this, esp. 
since this code is a bit tricky to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75355750
  
  [Test build #27803 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27803/consoleFull)
 for   PR 4021 at commit 
[`4ba9575`](https://github.com/apache/spark/commit/4ba95754e3f1155b1decbbc7ff77fd540570e518).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75355753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27803/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75353317
  
  [Test build #27803 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27803/consoleFull)
 for   PR 4021 at commit 
[`4ba9575`](https://github.com/apache/spark/commit/4ba95754e3f1155b1decbbc7ff77fd540570e518).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread ilganeli
Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75352650
  
Hi @JoshRosen - I can't figure out why this is failing - it passes locally 
for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75350635
  
  [Test build #27801 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27801/consoleFull)
 for   PR 4021 at commit 
[`8510943`](https://github.com/apache/spark/commit/8510943acc2a6faab23b7de96339dbe75b63abc9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75350643
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27801/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75348625
  
  [Test build #27801 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27801/consoleFull)
 for   PR 4021 at commit 
[`8510943`](https://github.com/apache/spark/commit/8510943acc2a6faab23b7de96339dbe75b63abc9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75348513
  
  [Test build #27800 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27800/consoleFull)
 for   PR 4021 at commit 
[`283a333`](https://github.com/apache/spark/commit/283a3330f76cd1dfd580615bb7cdfac2886dd891).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75348516
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27800/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75348339
  
  [Test build #27800 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27800/consoleFull)
 for   PR 4021 at commit 
[`283a333`](https://github.com/apache/spark/commit/283a3330f76cd1dfd580615bb7cdfac2886dd891).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread ilganeli
Github user ilganeli commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r25116499
  
--- Diff: core/src/test/scala/org/apache/spark/AccumulatorSuite.scala ---
@@ -135,5 +137,22 @@ class AccumulatorSuite extends FunSuite with Matchers 
with LocalSparkContext {
   resetSparkContext()
 }
   }
+  
+  test ("garbage collection") {
+// Create an accumulator and let it go out of scope to test that it's 
properly garbage collected
+sc = new SparkContext("local", "test")
+var acc: Accumulable[mutable.Set[Any], Any] = sc.accumulable(new 
mutable.HashSet[Any]())
+val accId = acc.id
+val ref = WeakReference(acc)
+
+// Ensure the accumulator is present
+assert(ref.get.isDefined)
+
+// Remove the explicit reference to it and allow weak reference to get 
garbage collected
+acc = null
+System.gc()
+assert(ref.get.isEmpty)
+assert(Accumulators.originals.get(accId).isDefined)
--- End diff --

Hi Josh, how do I register a cleanup task ? I see a private method within 
ContextCleaner: registerForCleanup , that it seems I can use but I don't have 
visibility of that method within the Accumulator class. How would you recommend 
doing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75323330
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27789/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75323305
  
  [Test build #27789 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27789/consoleFull)
 for   PR 4021 at commit 
[`7485a82`](https://github.com/apache/spark/commit/7485a82c4bddff21e63327f0d3cea24b8b7bf7fb).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r25100214
  
--- Diff: core/src/test/scala/org/apache/spark/AccumulatorSuite.scala ---
@@ -135,5 +137,22 @@ class AccumulatorSuite extends FunSuite with Matchers 
with LocalSparkContext {
   resetSparkContext()
 }
   }
+  
+  test ("garbage collection") {
+// Create an accumulator and let it go out of scope to test that it's 
properly garbage collected
+sc = new SparkContext("local", "test")
+var acc: Accumulable[mutable.Set[Any], Any] = sc.accumulable(new 
mutable.HashSet[Any]())
+val accId = acc.id
+val ref = WeakReference(acc)
+
+// Ensure the accumulator is present
+assert(ref.get.isDefined)
+
+// Remove the explicit reference to it and allow weak reference to get 
garbage collected
+acc = null
+System.gc()
+assert(ref.get.isEmpty)
+assert(Accumulators.originals.get(accId).isDefined)
--- End diff --

I guess this indicates that we don't garbage-collect the map entry that 
points to the weak accumulator reference.  This memory leak isn't quite as huge 
of a concern as leaking the accumulator itself, since we expect the map entry 
to be a small, fixed-size record (a couple hundred bytes, max, probably).

If we do decide to clean this up, though, we can probably do it by adding a 
synchronized method to the Accumulators object that removes an accumulator, 
then register a cleanup task with ContextCleaner when creating an accumulator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75313991
  
Overall, this looks pretty good to me, since it's a huge improvement over 
the status quo of retaining accumulators indefinitely.  I left one minor 
comment regarding cleanup of the accumulator map entries themselves.  The 
actual memory-leak addressed by my ContextCleaner suggestion isn't as big a 
problem as the original issue fixed here, but using a ContextCleaner cleanup 
task also gives us the chance to output a useful log message when accumulators 
are garbage collected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r25100312
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -746,7 +746,14 @@ class DAGSchedulerSuite extends 
TestKit(ActorSystem("DAGSchedulerSuite")) with F
 completeWithAccumulator(accum.id, taskSets(0), Seq((Success, 42)))
 completeWithAccumulator(accum.id, taskSets(0), Seq((Success, 42)))
 assert(results === Map(0 -> 42))
-assert(Accumulators.originals(accum.id).value === 1)
+
+val accVal = Accumulators.originals(accum.id).get match {
--- End diff --

Instead of doing a match here, you could probably just call `get`, since 
that wil stilll cause the test to fail if the reference expired, but will give 
a more useful error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75310803
  
  [Test build #27789 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27789/consoleFull)
 for   PR 4021 at commit 
[`7485a82`](https://github.com/apache/spark/commit/7485a82c4bddff21e63327f0d3cea24b8b7bf7fb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-20 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75310626
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-17 Thread ilganeli
Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-74782086
  
Hi @JoshRosen - could I please get a review of this? Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-73376077
  
  [Test build #27005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27005/consoleFull)
 for   PR 4021 at commit 
[`c8e0f2b`](https://github.com/apache/spark/commit/c8e0f2bfa787988770a4e02f65d04dab4aaa2a86).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-08 Thread ilganeli
Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-73376026
  
Hi @JoshRosen, I've added a test for this patch to the Accumulator Suite. 
Please let me know if you think it's sufficient. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-73389753
  
  [Test build #27018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27018/consoleFull)
 for   PR 4021 at commit 
[`7485a82`](https://github.com/apache/spark/commit/7485a82c4bddff21e63327f0d3cea24b8b7bf7fb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-08 Thread ilganeli
Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-73389626
  
 Oops. Didn't push the latest commit. That got fixed.
On Sat, Feb 7, 2015 at 4:39 PM Josh Rosen  wrote:

> Looks like a test compilation error:
>
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/src/test/scala/org/apache/spark/AccumulatorSuite.scala:141:
 type mismatch;
> [error]  found   : Unit
> [error]  required: Long
> [error]   def makeAcc() : Long = {
> [error]  ^
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-08 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-73386809
  
Looks like a test compilation error:

```
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/src/test/scala/org/apache/spark/AccumulatorSuite.scala:141:
 type mismatch;
[error]  found   : Unit
[error]  required: Long
[error]   def makeAcc() : Long = {
[error]  ^
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-73377606
  
  [Test build #27005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27005/consoleFull)
 for   PR 4021 at commit 
[`c8e0f2b`](https://github.com/apache/spark/commit/c8e0f2bfa787988770a4e02f65d04dab4aaa2a86).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-73377610
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27005/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-73391865
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27018/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-73391862
  
  [Test build #27018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27018/consoleFull)
 for   PR 4021 at commit 
[`7485a82`](https://github.com/apache/spark/commit/7485a82c4bddff21e63327f0d3cea24b8b7bf7fb).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-70117592
  
  [Test build #25606 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25606/consoleFull)
 for   PR 4021 at commit 
[`18d62ec`](https://github.com/apache/spark/commit/18d62ec8906c4ea3fc8d753e889f36f87b539ef5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-70117605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25606/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread ilganeli
Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-70105302
  
I've updated the code to throw an exception in the error case you mentioned 
and I've reverted the file permission change. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread ilganeli
Github user ilganeli commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r23018036
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -280,10 +281,12 @@ object AccumulatorParam {
 // TODO: The multi-thread support in accumulators is kind of lame; check
 // if there's a more intuitive way of doing it right
 private[spark] object Accumulators {
-  // TODO: Use soft references? => need to make readObject work properly 
then
-  val originals = Map[Long, Accumulable[_, _]]()
-  val localAccums = new ThreadLocal[Map[Long, Accumulable[_, _]]]() {
-override protected def initialValue() = Map[Long, Accumulable[_, _]]()
+  // Store a WeakReference instead of a StrongReference because this way 
accumulators can be
+  // appropriately garbage collected during long-running jobs and release 
memory
+  type WeakAcc = WeakReference[Accumulable[_, _]]
+  val originals = Map[Long, WeakAcc]()
+  val localAccums = new ThreadLocal[Map[Long, WeakAcc]]() {
--- End diff --

Hi Josh - are you suggesting to replace this snippet with a MapMaker just 
to simplify the initialization code? I believe the usage of either object would 
be the same - do you see a specific advantage to trying to use the MapMaker?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-70104769
  
  [Test build #25606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25606/consoleFull)
 for   PR 4021 at commit 
[`18d62ec`](https://github.com/apache/spark/commit/18d62ec8906c4ea3fc8d753e889f36f87b539ef5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-13 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r22889431
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -320,7 +328,12 @@ private[spark] object Accumulators {
   def add(values: Map[Long, Any]): Unit = synchronized {
 for ((id, value) <- values) {
   if (originals.contains(id)) {
-originals(id).asInstanceOf[Accumulable[Any, Any]] ++= value
+// Since we are now storing weak references, we must check whether 
the underlying data
+// is valid. 
+originals(id).get match {
+  case Some(accum) => accum.asInstanceOf[Accumulable[Any, Any]] 
++= value
+  case None => // Do nothing   
--- End diff --

Should this be an error-case?  This seems to mean that you've received an 
accumulator update for an accumulator that's been garbage-collected, which 
seems like it should be an error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-13 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r22889328
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -280,10 +281,12 @@ object AccumulatorParam {
 // TODO: The multi-thread support in accumulators is kind of lame; check
 // if there's a more intuitive way of doing it right
 private[spark] object Accumulators {
-  // TODO: Use soft references? => need to make readObject work properly 
then
-  val originals = Map[Long, Accumulable[_, _]]()
-  val localAccums = new ThreadLocal[Map[Long, Accumulable[_, _]]]() {
-override protected def initialValue() = Map[Long, Accumulable[_, _]]()
+  // Store a WeakReference instead of a StrongReference because this way 
accumulators can be
+  // appropriately garbage collected during long-running jobs and release 
memory
+  type WeakAcc = WeakReference[Accumulable[_, _]]
+  val originals = Map[Long, WeakAcc]()
+  val localAccums = new ThreadLocal[Map[Long, WeakAcc]]() {
--- End diff --

Guava MapMaper supports `weakValues`; not sure if we want to use that here, 
since it's not super Scala-friendly (e.g. returns nulls, etc): 
http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/MapMaker.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-13 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-69807225
  
It looks like a number of file permissions changes got mixed into this PR; 
mind reverting those changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-69798754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25475/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-69798748
  
  [Test build #25475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25475/consoleFull)
 for   PR 4021 at commit 
[`28f705c`](https://github.com/apache/spark/commit/28f705c81b7300de0162ddb85387a58a3e075410).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val classServer   = new 
HttpServer(conf, outputDir, new SecurityManager(conf), classServerPort, "HTTP 
class server")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-69788394
  
  [Test build #25475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25475/consoleFull)
 for   PR 4021 at commit 
[`28f705c`](https://github.com/apache/spark/commit/28f705c81b7300de0162ddb85387a58a3e075410).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-13 Thread ilganeli
GitHub user ilganeli opened a pull request:

https://github.com/apache/spark/pull/4021

[SPARK-3885] Provide mechanism to remove accumulators once they are no 
longer used

Instead of storing a strong reference to accumulators, I've replaced this 
with a weak reference and updated any code that uses these accumulators to 
check whether the reference resolves before using the accumulator. A weak 
reference will be cleared when there is no longer an existing copy of the 
variable versus using a soft reference in which case accumulators would only be 
cleared when the GC explicitly ran out of memory. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilganeli/spark SPARK-3885

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4021.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4021


commit a77d11b46c015bb5657496e971a52a853127c7cc
Author: Ilya Ganelin 
Date:   2015-01-07T19:57:16Z

Updated Accumulators class to store weak references instead of strong 
references to allow garbage collection of old accumulators

commit cbb9023a84c1a06ff5b7c918cdb616f10e66d4d1
Author: Ilya Ganelin 
Date:   2015-01-07T19:57:24Z

Merge remote-tracking branch 'upstream/master' into SPARK-3885

commit c49066a63d32f77476a41f71a37bd74c7315f099
Author: Ilya Ganelin 
Date:   2015-01-09T18:32:20Z

Merge remote-tracking branch 'upstream/master' into SPARK-3885

commit 33508522eb58a5159a4318bb435f6c6f3b8ccecc
Author: Ilya Ganelin 
Date:   2015-01-09T18:44:07Z

Updated DAGScheduler and Suite to correctly use new implementation of 
WeakRef Accumulator storage

commit 0746e615bf3cffa2d39923ebfa31b30df075c633
Author: Ilya Ganelin 
Date:   2015-01-09T19:15:13Z

Updated DAGSchedulerSUite to fix bug

commit d78f4bf11465819053482b319f10cacb6e447650
Author: Ilya Ganelin 
Date:   2015-01-10T00:55:18Z

Removed obsolete comment

commit b820ab4b71f12edf5d626cbdbb672a2b26c56182
Author: Ilya Ganelin 
Date:   2015-01-12T15:43:01Z

reset

commit 28f705c81b7300de0162ddb85387a58a3e075410
Author: Ilya Ganelin 
Date:   2015-01-13T17:45:55Z

Merge remote-tracking branch 'upstream/master' into SPARK-3885




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org