[ https://issues.apache.org/jira/browse/SPARK-35262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335343#comment-17335343 ]
Fu Chen commented on SPARK-35262: --------------------------------- [~iamelin] This is should be a duplicate bug with SPARK-34087 and has been fixed by PR-31919. Spark 3.1.1 has a memory leak when we clone the SparkSession. When you disabled `spark.sql.sources.bucketing.autoBucketedScan.enabled` and `spark.sql.adaptive.enabled, the CacheManager cache query using original SparkSession (means spark not clone session). > Memory leak when dataset is being persisted > ------------------------------------------- > > Key: SPARK-35262 > URL: https://issues.apache.org/jira/browse/SPARK-35262 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.1 > Reporter: Igor Amelin > Priority: Critical > > If a Java- or Scala-application with SparkSession runs a long time and > persists a lot of datasets, it can crash because of a memory leak. > I've noticed the following. When we have a dataset and persist it, the > SparkSession used to load that dataset is cloned in CacheManager, and this > clone is added as a listener to `listenersPlusTimers` in `ListenerBus`. But > this clone isn't removed from the list of listeners after that, e.g. > unpersisting the dataset. If we persist a lot of datasets, the SparkSession > is cloned and added to `ListenerBus` many times. This leads to a memory leak > since the `listenersPlusTimers` list become very large. > I've found out that the SparkSession is cloned is CacheManager when the > parameters `spark.sql.sources.bucketing.autoBucketedScan.enabled` and > `spark.sql.adaptive.enabled` are true. The first one is true by default, and > this default behavior leads to the problem. When auto bucketed scan is > disabled, the SparkSession isn't cloned, and there are no duplicates in > ListenerBus, so the memory leak doesn't occur. > Here is a small Java application to reproduce the memory leak: > [https://github.com/iamelin/spark-memory-leak] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org