Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3009#issuecomment-63604040 Just realized that these phantom stages will lead to memory leaks in the `stageIdToData` and `stageIdToInfo`: we only `remove()` entries from these data structures as a side-effect of removing them from other lists / maps; since these phantom stages never leave the "pending" stage, they never enter any of the data structures that drive the cleanup process, leading to leaks. I also realized that there's also a blocker memory leak in the stable progress reporting API: we never actually remove jobs from the `jobIdToData` map. In fact, there's a somewhat embarrassing typo: _retailed_ jobs: ```scala val retailedJobs = conf.getInt("spark.ui.retainedJobs", DEFAULT_RETAINED_JOBS) ``` This is my fault: I should have added a test that set `spark.ui.retainedJobs` to some low threshold and checked that we only reported information for that many jobs. This was sloppy on my part, since there was already a corresponding "test LRU eviction of stages" test that I should have just copied and modified. I wonder if there's a better way to guarantee that we won't leak memory as we add more maps / views to JobProgressListener. I think that Guava has some size-limited HashMaps with configurable eviction policies; this might be safer than relying on the proper cleanup code being called in the right place.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org