GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/22156
[SPARK-25144][SQL][TEST][BRANCH-2.2] Free aggregate map when task ends ## What changes were proposed in this pull request? [SPARK-25144](https://issues.apache.org/jira/browse/SPARK-25144) reports memory leaks on Apache Spark 2.0.2 ~ 2.3.2-RC5. ```scala scala> case class Foo(bar: Option[String]) scala> val ds = List(Foo(Some("bar"))).toDS scala> val result = ds.flatMap(_.bar).distinct scala> result.rdd.isEmpty 18/08/19 23:01:54 WARN Executor: Managed memory leak detected; size = 8650752 bytes, TID = 125 res0: Boolean = false ``` This is a backport of cloud-fan 's https://github.com/apache/spark/pull/21738 which is a single commit among 3 commits of SPARK-21743. In addition, I added a test case to prevent regressions in branch-2.3 and branch-2.2. Although SPARK-21743 is reverted due to regression, this subpatch can go to branch-2.3 and branch-2.2. This will be merged as cloud-fan 's commit. ## How was this patch tested? Pass the jenkins with a newly added test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-25144-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22156.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22156 ---- commit 27dea91f1126ae4b575246d1e17410e79042e9e1 Author: Wenchen Fan <wenchen@...> Date: 2018-08-20T12:44:22Z [SPARK-25144][SQL][TEST][BRANCH-2.2] Free aggregate map when task ends [SPARK-25144](https://issues.apache.org/jira/browse/SPARK-25144) reports memory leaks on Apache Spark 2.0.2 ~ 2.3.2-RC5. ```scala scala> case class Foo(bar: Option[String]) scala> val ds = List(Foo(Some("bar"))).toDS scala> val result = ds.flatMap(_.bar).distinct scala> result.rdd.isEmpty 18/08/19 23:01:54 WARN Executor: Managed memory leak detected; size = 8650752 bytes, TID = 125 res0: Boolean = false ``` This is a backport of cloud-fan 's https://github.com/apache/spark/pull/21738 which is a single commit among 3 commits of SPARK-21743. In addition, I added a test case to prevent regressions in branch-2.3 and branch-2.2. Although SPARK-21743 is reverted due to regression, this subpatch can go to branch-2.3 and branch-2.2. This will be merged as cloud-fan 's commit. Pass the jenkins with a newly added test case. Closes #22150 from dongjoon-hyun/SPARK-25144. Lead-authored-by: Wenchen Fan <wenc...@databricks.com> Co-authored-by: Dongjoon Hyun <dongj...@apache.org> Signed-off-by: hyukjinkwon <gurwls...@apache.org> ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org