Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3009#issuecomment-63604040
  
    Just realized that these phantom stages will lead to memory leaks in the 
`stageIdToData` and `stageIdToInfo`: we only `remove()` entries from these data 
structures as a side-effect of removing them from other lists / maps; since 
these phantom stages never leave the "pending" stage, they never enter any of 
the data structures that drive the cleanup process, leading to leaks.
    
    I also realized that there's also a blocker memory leak in the stable 
progress reporting API: we never actually remove jobs from the `jobIdToData` 
map.  In fact, there's a somewhat embarrassing typo: _retailed_ jobs:
    
    ```scala
      val retailedJobs = conf.getInt("spark.ui.retainedJobs", 
DEFAULT_RETAINED_JOBS)
    ```
    
    This is my fault: I should have added a test that set 
`spark.ui.retainedJobs` to some low threshold and checked that we only reported 
information for that many jobs.  This was sloppy on my part, since there was 
already a corresponding "test LRU eviction of stages" test that I should have 
just copied and modified.
    
    I wonder if there's a better way to guarantee that we won't leak memory as 
we add more maps / views to JobProgressListener.  I think that Guava has some 
size-limited HashMaps with configurable eviction policies; this might be safer 
than relying on the proper cleanup code being called in the right place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to