[ 
https://issues.apache.org/jira/browse/SPARK-27337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476829#comment-17476829
 ] 

Stanislav Savulchik commented on SPARK-27337:
---------------------------------------------

Hi,

I've found this ticket while investigating an apparent memory leak in an 
instance of a long running spark 3.1.1 driver java process executing various 
jobs posted by an external scheduler.

I took a heap dump (jmap -dump:live,file=dump.hprof <pid>) during an idle 
period when there were no running jobs and opened it with Eclipse Memory 
Analyzer. I saw a similar picture as posted by [~vinooganesh] .

[^Screenshot 2022-01-16 at 23.16.10.png]

Every posted job is given a fresh SparkSession instance using 
SparkSession#newSession method. After a job is done its SparkSession instance 
is no longer referenced and is expected to be garbage collected with all 
accumulated session state.

Apparently in some cases some old SparkSessions are still referenced from 
AsyncEventQueue even after manual or scheduled System.gc() calls by spark 
context cleaner, more specifically from ExecutionListenerBus instances still 
residing in a listeners queue.

I tried to correlate this with spark driver metrics and my current guess is 
that the reason of stuck ExecutionListenerBus instances – dropped events on a 
_shared_ queue.

I would appreciate if anyone could verify my reasoning. Thank you.

> QueryExecutionListener never cleans up listeners from the bus after 
> SparkSession is cleared
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27337
>                 URL: https://issues.apache.org/jira/browse/SPARK-27337
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Vinoo Ganesh
>            Priority: Major
>         Attachments: Screenshot 2022-01-16 at 23.16.10.png, image001-1.png
>
>
> As a result of 
> [https://github.com/apache/spark/commit/9690eba16efe6d25261934d8b73a221972b684f3],
>  it looks like there is a memory leak (specifically 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L131).]
>  
> Because the Listener Bus on the context still has a reference to the listener 
> (even after the SparkSession is cleared), they are never cleaned up. This 
> means that if you close and remake spark sessions fairly frequently, you're 
> leaking every single time. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to