[ https://issues.apache.org/jira/browse/SPARK-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-11022: ------------------------------ Priority: Minor (was: Major) Can you update the title to be more clear about the cause and resolution? you are specifically suggesting that the list of executors needs to be garbage collected. (Do you really have 17K executors, most of which are dead, in one app?) > Spark Worker process find Memory leaking after long time running > ---------------------------------------------------------------- > > Key: SPARK-11022 > URL: https://issues.apache.org/jira/browse/SPARK-11022 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.4.0 > Reporter: colin shaw > Priority: Minor > > Worker process often down,while there were not any abnormal tasks,just crash > without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 > instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by > "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) > bytes. " > and almost all the instance were stored in a > "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field > hold many ExecutorRunner. > The codes(Worker.scala) shows finishedExecutors just > "finishedExecutors(fullId) = executor" and > "finishedExecutors.values.toList",there is no action which remove the > Executor,all were stored in memory,so after long time running,crashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org