[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-23 Thread advancedxy
Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/23083
  
> For the task completion listener, I think it's an overkill to introduce a 
new API, do you know where exactly we leak the memory? and can we null it out 
when the ShuffleBlockFetcherIterator reaches to its end?

If I understand correctly, the memory is leaked because external sorter is 
referenced in `TaskCompletionListener` and it's only gced when the task is 
completed. However for `coalesce` or similar APIs, multiple 
`BlockStoreShuffleReader`s are created as there are multiple input sources, the 
internal sorter is not released until all shuffle readers are consumed and task 
is finished.

It's an overkill to introduce a new API. However, I think we can limited it 
into private[Spark] scope. 
Like @szhem, I don't figure out another way to null out the sorter 
reference yet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23083
  
Looking at the code, we are trying to fix 2 memory leaks: the task 
completion listener in `ShuffleBlockFetcherIterator`, and the 
`CompletionIterator`. If that's case, can you say that in the PR description?

For the task completion listener, I think it's an overkill to introduce a 
new API, do we know exactly where we leak the memory? and can we null it out 
when the `ShuffleBlockFetcherIterator` reaches to its end?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-21 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/23083
  
> So do you mean CoGroupRDDs with multiple input sources will have similar 
problems?

Yep, but a little bit different ones

> If so, can you create another Jira?

Will do it shortly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-21 Thread advancedxy
Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/23083
  
And another thing:
> P.S. This PR does not cover cases with CoGroupedRDDs which use 
ExternalAppendOnlyMap internally, which itself can lead to OutOfMemoryErrors in 
many places.
So do you mean CoGroupRDDs with multiple input sources with have similar 
problems? If so, can you create another Jira?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-20 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/23083
  
Hi @davies, @advancedxy, @rxin,
You seem to be the last ones who touched the corresponding parts of the 
files in this PR.
Could you be so kind to take a look at it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23083
  
**[Test build #4433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4433/testReport)**
 for PR 23083 at commit 
[`12075ec`](https://github.com/apache/spark/commit/12075ec265f0d09cd52865bb91155898b9ede523).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23083
  
**[Test build #4433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4433/testReport)**
 for PR 23083 at commit 
[`12075ec`](https://github.com/apache/spark/commit/12075ec265f0d09cd52865bb91155898b9ede523).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org