[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user tashoyan closed the pull request at: https://github.com/apache/spark/pull/19700 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user tashoyan commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150100750 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- Done for branch-2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150063165 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- @tashoyan . Since you are not introducing a new one, could you use the existing default value? ```scala - private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) + private val retainedStages = +conf.getInt("spark.ui.retainedStages", SparkUI.DEFAULT_RETAINED_STAGES) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150054899 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- Ah. My bad. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user tashoyan commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150052324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- @dongjoon-hyun , It is already documented in the same file configuration.md: ``` How many stages the Spark UI and status APIs remember before garbage collecting. This is a target maximum, and fewer elements may be retained in some circumstances. ``` I did not involve a new parameter, I just used an existing one. Regarding renaming to `spark.sql.ui.retainedStages`, I believe it should be done in a separate pull request - if should. This parameter is also used in other parts of Spark code, not only SQL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150041775 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- BTW, the name should be `spark.sql.ui.retainedStages` instead of `spark.ui.retainedStages`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150041314 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- @tashoyan . Could you add a doc for this like `spark.sql.ui.retainedExecutions` here? Please refer #9052. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
GitHub user tashoyan opened a pull request: https://github.com/apache/spark/pull/19700 [SPARK-22471][SQL] SQLListener consumes much memory causing OutOfMemoryError ## What changes were proposed in this pull request? This PR addresses the issue [SPARK-22471](https://issues.apache.org/jira/browse/SPARK-22471). The modified version of `SQLListener` respects the setting `spark.ui.retainedStages` and keeps the number of the tracked stages within the specified limit. The hash map `_stageIdToStageMetrics` does not outgrow the limit, hence overall memory consumption does not grow with time anymore. ## How was this patch tested? A new unit test covers this fix - see `SQLListenerMemorySuite.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tashoyan/spark SPARK-22471 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19700.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19700 commit 0388f6ce50d568a0493e7959ec005ee5afc20bd0 Author: Arseniy TashoyanDate: 2017-11-08T15:41:36Z Add reproducer for the issue SPARK-22471 commit 42e80272cf0926f0fd978e6b7617685987d8fc93 Author: Arseniy Tashoyan Date: 2017-11-08T15:41:54Z Add fix for the issue SPARK-22471 commit 2f793ad1f001bc58dd09fa4eaec6ae423445f86f Author: Arseniy Tashoyan Date: 2017-11-08T20:39:02Z Remove debug print and irrelevant checks. Add a reference to the issue. commit 4780d95b7d58df741eb8d5756c8109fc7dbfb457 Author: Arseniy Tashoyan Date: 2017-11-08T20:47:44Z Remove debug print and irrelevant checks. Add a reference to the issue. commit 79c83a715d4a36ad00ff3888e8e2953fcc163d17 Author: Arseniy Tashoyan Date: 2017-11-08T21:21:42Z Collect memory-related tests on SQLListener in the same suite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org