[ https://issues.apache.org/jira/browse/SPARK-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell updated SPARK-4906: ----------------------------------- Component/s: Web UI > Spark master OOMs with exception stack trace stored in JobProgressListener > -------------------------------------------------------------------------- > > Key: SPARK-4906 > URL: https://issues.apache.org/jira/browse/SPARK-4906 > Project: Spark > Issue Type: Bug > Components: Web UI > Affects Versions: 1.1.1 > Reporter: Mingyu Kim > > Spark master was OOMing with a lot of stack traces retained in > JobProgressListener. The object dependency goes like the following. > JobProgressListener.stageIdToData => StageUIData.taskData => > TaskUIData.errorMessage > Each error message is ~10kb since it has the entire stack trace. As we have a > lot of tasks, when all of the tasks across multiple stages go bad, these > error messages accounted for 0.5GB of heap at some point. > Please correct me if I'm wrong, but it looks like all the task info for > running applications are kept in memory, which means it's almost always bound > to OOM for long-running applications. Would it make sense to fix this, for > example, by spilling some UI states to disk? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org