[ https://issues.apache.org/jira/browse/SPARK-12757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092646#comment-15092646 ]
Apache Spark commented on SPARK-12757: -------------------------------------- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/10705 > Use reference counting to prevent blocks from being evicted during reads > ------------------------------------------------------------------------ > > Key: SPARK-12757 > URL: https://issues.apache.org/jira/browse/SPARK-12757 > Project: Spark > Issue Type: Improvement > Components: Block Manager > Reporter: Josh Rosen > Assignee: Josh Rosen > > As a pre-requisite to off-heap caching of blocks, we need a mechanism to > prevent pages / blocks from being evicted while they are being read. With > on-heap objects, evicting a block while it is being read merely leads to > memory-accounting problems (because we assume that an evicted block is a > candidate for garbage-collection, which will not be true during a read), but > with off-heap memory this will lead to either data corruption or segmentation > faults. > To address this, we should add a reference-counting mechanism to track which > blocks/pages are being read in order to prevent them from being evicted > prematurely. I propose to do this in two phases: first, add a safe, > conservative approach in which all BlockManager.get*() calls implicitly > increment the reference count of blocks and where tasks' references are > automatically freed upon task completion. This will be correct but may have > adverse performance impacts because it will prevent legitimate block > evictions. In phase two, we should incrementally add release() calls in order > to fix the eviction of unreferenced blocks. The latter change may need to > touch many different components, which is why I propose to do it separately > in order to make the changes easier to reason about and review. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org