[ https://issues.apache.org/jira/browse/SPARK-34689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen resolved SPARK-34689. ---------------------------------- Resolution: Duplicate > Spark Thrift Server: Memory leak for SparkSession objects > --------------------------------------------------------- > > Key: SPARK-34689 > URL: https://issues.apache.org/jira/browse/SPARK-34689 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 3.0.1, 3.1.1 > Reporter: Dimitris Batis > Priority: Major > Attachments: heap_sparksession.png, > heapdump_local_attempt_250_closed_connections.png, test_patch.diff > > > When running the Spark Thrift Server (3.0.1, standalone cluster), we have > noticed that each new JDBC connection creates a new SparkSession object. This > object (and anything being referenced by it), however, remains in memory > indefinitely even though the JDBC connection is closed, and full GCs do not > remove it. After about 18 hours of heavy use, we get more than 46.000 such > objects (heap_sparksession.png). > In a small local installation test, I replicated the behavior by simply > opening a JDBC connection, executing SHOW SCHEMAS and closing the connection > (heapdump_local_attempt.png). For each connection, a new SparkSession object > is created and never removed. I have noticed the same behavior in Spark 3.1.1 > as well. > Our settings are as follows. Please note that this was occuring even before > we added the ExplicitGCInvokesConcurrent option (i.e. it happened even when a > full GC was performed every 20 minutes). > spark-defaults.conf: > {code} > spark.master spark://...:7077,...:7077 > spark.master.rest.enabled true > spark.eventLog.enabled false > spark.eventLog.dir file:///... > spark.driver.cores 1 > spark.driver.maxResultSize 4g > spark.driver.memory 5g > spark.executor.memory 1g > spark.executor.logs.rolling.maxRetainedFiles 2 > spark.executor.logs.rolling.strategy size > spark.executor.logs.rolling.maxSize 1G > spark.local.dir ... > spark.sql.ui.retainedExecutions=10 > spark.ui.retainedDeadExecutors=10 > spark.worker.ui.retainedExecutors=10 > spark.worker.ui.retainedDrivers=10 > spark.ui.retainedJobs=30 > spark.ui.retainedStages=100 > spark.ui.retainedTasks=500 > spark.appStateStore.asyncTracking.enable=false > spark.sql.shuffle.partitions=200 > spark.default.parallelism=200 > spark.task.reaper.enabled=true > spark.task.reaper.threadDump=false > spark.memory.offHeap.enabled=true > spark.memory.offHeap.size=4g > {code} > spark-env.sh: > {code} > HADOOP_CONF_DIR="/.../hadoop/etc/hadoop" > SPARK_WORKER_CORES=28 > SPARK_WORKER_MEMORY=54g > SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true > -Dspark.worker.cleanup.appDataTtl=172800 -XX:+UseG1GC > -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=40 " > SPARK_DAEMON_JAVA_OPTS="-Dlog4j.configuration=file:///.../log4j.properties > -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.dir="..." > -Dspark.deploy.zookeeper.url=...:2181,...:2181,...:2181 -XX:+UseG1GC > -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=40" > {code} > start-thriftserver.sh: > {code} > export SPARK_DAEMON_MEMORY=16g > exec "${SPARK_HOME}"/sbin/spark-daemon.sh submit $CLASS 1 \ > --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \ > --conf "spark.ui.retainedJobs=30" \ > --conf "spark.ui.retainedStages=100" \ > --conf "spark.ui.retainedTasks=500" \ > --conf "spark.sql.ui.retainedExecutions=10" \ > --conf "spark.appStateStore.asyncTracking.enable=false" \ > --conf "spark.cleaner.periodicGC.interval=20min" \ > --conf "spark.sql.autoBroadcastJoinThreshold=-1" \ > --conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseG1GC > -XX:MaxGCPauseMillis=200" \ > --conf "spark.driver.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps > -Xloggc:/.../thrift_driver_gc.log -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=7 -XX:GCLogFileSize=35M -XX:+UseG1GC > -XX:MaxGCPauseMillis=200 -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.port=11990 -XX:+ExplicitGCInvokesConcurrent" \ > --conf "spark.metrics.namespace=..." --name "..." --packages > io.delta:delta-core_2.12:0.7.0 --hiveconf spark.ui.port=4038 --hiveconf > spark.cores.max=22 --hiveconf spark.executor.cores=3 --hiveconf > spark.executor.memory=6144M --hiveconf spark.scheduler.mode=FAIR --hiveconf > spark.scheduler.allocation.file=.../conf/thrift-scheduler.xml \ > --conf spark.sql.thriftServer.incrementalCollect=true "$@" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org