Hi Jason LiveUI initializes ElementTrackingStore with InMemoryStore, so it has OOM risk.
/** * Create an in-memory store for a live application. */ def createLiveStore( conf: SparkConf, appStatusSource: Option[AppStatusSource] = None): AppStatusStore = { val store = new ElementTrackingStore(new InMemoryStore(), conf) val listener = new AppStatusListener(store, conf, true, appStatusSource) new AppStatusStore(store, listener = Some(listener)) } In addition to the parameters you mentioned, you can try to reduce the following parameters: * spark.ui.retainedTasks * spark.ui.dagGraph.retainedRootRDDs If you have more information about this situation, it would be good. Best Qian > 2022年8月3日 上午11:04,Jason Jun <jaes...@gmail.com> 写道: > > He there, > > We have spark driver running 24x7, and we are continiously getting OOM in > spark driver every 10 days. > I found org.apache.spark.status.ElementTrackingStore keep 85% of heap usage > after analyzing heap dump like this image: > <image.png> > > i found these parameter would be the root cause in jira ticket, > https://issues.apache.org/jira/browse/SPARK-26395 > <https://issues.apache.org/jira/browse/SPARK-26395> > spark.ui.retainedDeadExecutors > spark.ui.retainedJobs > spark.ui.retainedStages > > But it didn't work. OOM is delayed from 1 week to 10 days with these changes. > > It would be really appreciated if anyone can give me any solutions. > > Thanks > Jason > > .