He there,

We have spark driver running 24x7, and we are continiously getting OOM in
spark driver every 10 days.
I found org.apache.spark.status.ElementTrackingStore keep 85% of heap usage
after analyzing heap dump like this image:
[image: image.png]

i found these parameter would be the root cause in jira ticket,
https://issues.apache.org/jira/browse/SPARK-26395

   - spark.ui.retainedDeadExecutors
   - spark.ui.retainedJobs
   - spark.ui.retainedStages


But it didn't work. OOM is delayed from 1 week to 10 days with these
changes.

It would be really appreciated if anyone can give me any solutions.

Thanks
Jason

.

Reply via email to