[ 
https://issues.apache.org/jira/browse/SPARK-12757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810913#comment-15810913
 ] 

Apache Spark commented on SPARK-12757:
--------------------------------------

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/16513

> Use reference counting to prevent blocks from being evicted during reads
> ------------------------------------------------------------------------
>
>                 Key: SPARK-12757
>                 URL: https://issues.apache.org/jira/browse/SPARK-12757
>             Project: Spark
>          Issue Type: Improvement
>          Components: Block Manager
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>             Fix For: 2.0.0
>
>
> As a pre-requisite to off-heap caching of blocks, we need a mechanism to 
> prevent pages / blocks from being evicted while they are being read. With 
> on-heap objects, evicting a block while it is being read merely leads to 
> memory-accounting problems (because we assume that an evicted block is a 
> candidate for garbage-collection, which will not be true during a read), but 
> with off-heap memory this will lead to either data corruption or segmentation 
> faults.
> To address this, we should add a reference-counting mechanism to track which 
> blocks/pages are being read in order to prevent them from being evicted 
> prematurely. I propose to do this in two phases: first, add a safe, 
> conservative approach in which all BlockManager.get*() calls implicitly 
> increment the reference count of blocks and where tasks' references are 
> automatically freed upon task completion. This will be correct but may have 
> adverse performance impacts because it will prevent legitimate block 
> evictions. In phase two, we should incrementally add release() calls in order 
> to fix the eviction of unreferenced blocks. The latter change may need to 
> touch many different components, which is why I propose to do it separately 
> in order to make the changes easier to reason about and review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to