[jira] [Commented] (SPARK-26395) Spark Thrift server memory leak
[ https://issues.apache.org/jira/browse/SPARK-26395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772058#comment-16772058 ] Konstantinos Andrikopoulos commented on SPARK-26395: After setting the property spark.appStateStore.asyncTracking.enable to false that made the situation a bit better in 2 out of our 3 Thrift server instances. However according to my understanding after setting this property to false we should n't face this issue. > Spark Thrift server memory leak > --- > > Key: SPARK-26395 > URL: https://issues.apache.org/jira/browse/SPARK-26395 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: Konstantinos Andrikopoulos >Priority: Major > > We are running Thrift Server in standalone mode and we have observed that the > heap of the driver is constantly increasing. After analysing the heap dump > the issue seems to be that the ElementTrackingStore is constantly increasing > due to the addition of RDDOperationGraphWrapper objects that are not cleaned > up. > The ElementTrackingStore defines the addTrigger method were you are able to > set thresholds in order to perform cleanup but in practice it is used for > ExecutorSummaryWrapper, JobDataWrapper and StageDataWrapper classes by using > the following spark properties > * spark.ui.retainedDeadExecutors > * spark.ui.retainedJobs > * spark.ui.retainedStages > So the RDDOperationGraphWrapper which is been added using the onJobStart > method of AppStatusListener class [kvstore.write(uigraph) #line 291] > in not cleaned up and it constantly increases causing a memory leak -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26395) Spark Thrift server memory leak
[ https://issues.apache.org/jira/browse/SPARK-26395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769704#comment-16769704 ] t oo commented on SPARK-26395: -- just wanted to add that i face same issue in spark 2.3.0 of Thrift Server in standalone mode that the heap of the driver is constantly increasing, it eventually leads to spark thrift process crashing > Spark Thrift server memory leak > --- > > Key: SPARK-26395 > URL: https://issues.apache.org/jira/browse/SPARK-26395 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: Konstantinos Andrikopoulos >Priority: Major > > We are running Thrift Server in standalone mode and we have observed that the > heap of the driver is constantly increasing. After analysing the heap dump > the issue seems to be that the ElementTrackingStore is constantly increasing > due to the addition of RDDOperationGraphWrapper objects that are not cleaned > up. > The ElementTrackingStore defines the addTrigger method were you are able to > set thresholds in order to perform cleanup but in practice it is used for > ExecutorSummaryWrapper, JobDataWrapper and StageDataWrapper classes by using > the following spark properties > * spark.ui.retainedDeadExecutors > * spark.ui.retainedJobs > * spark.ui.retainedStages > So the RDDOperationGraphWrapper which is been added using the onJobStart > method of AppStatusListener class [kvstore.write(uigraph) #line 291] > in not cleaned up and it constantly increases causing a memory leak -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26395) Spark Thrift server memory leak
[ https://issues.apache.org/jira/browse/SPARK-26395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766547#comment-16766547 ] Marcelo Vanzin commented on SPARK-26395: The code that cleans up stages does clean up the RDD graphs: {noformat} if (!hasMoreAttempts) { kvstore.delete(classOf[RDDOperationGraphWrapper], s.info.stageId) } {noformat} Are you sure stages are being properly cleaned up in your case? SPARK-25837 could cause stage cleanup to be really slow, that will be fixed in 2.3.3. > Spark Thrift server memory leak > --- > > Key: SPARK-26395 > URL: https://issues.apache.org/jira/browse/SPARK-26395 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: Konstantinos Andrikopoulos >Priority: Major > > We are running Thrift Server in standalone mode and we have observed that the > heap of the driver is constantly increasing. After analysing the heap dump > the issue seems to be that the ElementTrackingStore is constantly increasing > due to the addition of RDDOperationGraphWrapper objects that are not cleaned > up. > The ElementTrackingStore defines the addTrigger method were you are able to > set thresholds in order to perform cleanup but in practice it is used for > ExecutorSummaryWrapper, JobDataWrapper and StageDataWrapper classes by using > the following spark properties > * spark.ui.retainedDeadExecutors > * spark.ui.retainedJobs > * spark.ui.retainedStages > So the RDDOperationGraphWrapper which is been added using the onJobStart > method of AppStatusListener class [kvstore.write(uigraph) #line 291] > in not cleaned up and it constantly increases causing a memory leak -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org