[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

2017-11-10 Thread tashoyan
Github user tashoyan closed the pull request at:

https://github.com/apache/spark/pull/19700


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

2017-11-09 Thread tashoyan
Github user tashoyan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19700#discussion_r150100750
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
@@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends 
SparkListener with Logging {
 
   private val retainedExecutions = 
conf.getInt("spark.sql.ui.retainedExecutions", 1000)
 
+  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
--- End diff --

Done for branch-2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

2017-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19700#discussion_r150063165
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
@@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends 
SparkListener with Logging {
 
   private val retainedExecutions = 
conf.getInt("spark.sql.ui.retainedExecutions", 1000)
 
+  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
--- End diff --

@tashoyan . Since you are not introducing a new one, could you use the 
existing default value?
```scala
-  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
+  private val retainedStages =
+conf.getInt("spark.ui.retainedStages", SparkUI.DEFAULT_RETAINED_STAGES)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

2017-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19700#discussion_r150054899
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
@@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends 
SparkListener with Logging {
 
   private val retainedExecutions = 
conf.getInt("spark.sql.ui.retainedExecutions", 1000)
 
+  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
--- End diff --

Ah. My bad. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

2017-11-09 Thread tashoyan
Github user tashoyan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19700#discussion_r150052324
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
@@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends 
SparkListener with Logging {
 
   private val retainedExecutions = 
conf.getInt("spark.sql.ui.retainedExecutions", 1000)
 
+  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
--- End diff --

@dongjoon-hyun , It is already documented in the same file configuration.md:
```
How many stages the Spark UI and status APIs remember before garbage 
collecting.
This is a target maximum, and fewer elements may be retained in some 
circumstances.
```
I did not involve a new parameter, I just used an existing one.
Regarding renaming to `spark.sql.ui.retainedStages`, I believe it should be 
done in a separate pull request - if should. This parameter is also used in 
other parts of Spark code, not only SQL.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

2017-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19700#discussion_r150041775
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
@@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends 
SparkListener with Logging {
 
   private val retainedExecutions = 
conf.getInt("spark.sql.ui.retainedExecutions", 1000)
 
+  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
--- End diff --

BTW, the name should be `spark.sql.ui.retainedStages` instead of 
`spark.ui.retainedStages`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

2017-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19700#discussion_r150041314
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
@@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends 
SparkListener with Logging {
 
   private val retainedExecutions = 
conf.getInt("spark.sql.ui.retainedExecutions", 1000)
 
+  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
--- End diff --

@tashoyan . Could you add a doc for this like 
`spark.sql.ui.retainedExecutions` here?
Please refer #9052.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

2017-11-08 Thread tashoyan
GitHub user tashoyan opened a pull request:

https://github.com/apache/spark/pull/19700

[SPARK-22471][SQL] SQLListener consumes much memory causing OutOfMemoryError

## What changes were proposed in this pull request?

This PR addresses the issue 
[SPARK-22471](https://issues.apache.org/jira/browse/SPARK-22471). The modified 
version of `SQLListener` respects the setting `spark.ui.retainedStages` and 
keeps the number of the tracked stages within the specified limit. The hash map 
`_stageIdToStageMetrics` does not outgrow the limit, hence overall memory 
consumption does not grow with time anymore.

## How was this patch tested?

A new unit test covers this fix - see `SQLListenerMemorySuite.scala`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tashoyan/spark SPARK-22471

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19700.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19700


commit 0388f6ce50d568a0493e7959ec005ee5afc20bd0
Author: Arseniy Tashoyan 
Date:   2017-11-08T15:41:36Z

Add reproducer for the issue SPARK-22471

commit 42e80272cf0926f0fd978e6b7617685987d8fc93
Author: Arseniy Tashoyan 
Date:   2017-11-08T15:41:54Z

Add fix for the issue SPARK-22471

commit 2f793ad1f001bc58dd09fa4eaec6ae423445f86f
Author: Arseniy Tashoyan 
Date:   2017-11-08T20:39:02Z

Remove debug print and irrelevant checks. Add a reference to the issue.

commit 4780d95b7d58df741eb8d5756c8109fc7dbfb457
Author: Arseniy Tashoyan 
Date:   2017-11-08T20:47:44Z

Remove debug print and irrelevant checks. Add a reference to the issue.

commit 79c83a715d4a36ad00ff3888e8e2953fcc163d17
Author: Arseniy Tashoyan 
Date:   2017-11-08T21:21:42Z

Collect memory-related tests on SQLListener in the same suite




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org