[ https://issues.apache.org/jira/browse/SPARK-26395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766547#comment-16766547 ]
Marcelo Vanzin commented on SPARK-26395: ---------------------------------------- The code that cleans up stages does clean up the RDD graphs: {noformat} if (!hasMoreAttempts) { kvstore.delete(classOf[RDDOperationGraphWrapper], s.info.stageId) } {noformat} Are you sure stages are being properly cleaned up in your case? SPARK-25837 could cause stage cleanup to be really slow, that will be fixed in 2.3.3. > Spark Thrift server memory leak > ------------------------------- > > Key: SPARK-26395 > URL: https://issues.apache.org/jira/browse/SPARK-26395 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.2 > Reporter: Konstantinos Andrikopoulos > Priority: Major > > We are running Thrift Server in standalone mode and we have observed that the > heap of the driver is constantly increasing. After analysing the heap dump > the issue seems to be that the ElementTrackingStore is constantly increasing > due to the addition of RDDOperationGraphWrapper objects that are not cleaned > up. > The ElementTrackingStore defines the addTrigger method were you are able to > set thresholds in order to perform cleanup but in practice it is used for > ExecutorSummaryWrapper, JobDataWrapper and StageDataWrapper classes by using > the following spark properties > * spark.ui.retainedDeadExecutors > * spark.ui.retainedJobs > * spark.ui.retainedStages > So the RDDOperationGraphWrapper which is been added using the onJobStart > method of AppStatusListener class [kvstore.write(uigraph) #line 291] > in not cleaned up and it constantly increases causing a memory leak -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org