[ 
https://issues.apache.org/jira/browse/SPARK-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627513#comment-14627513
 ] 

Dibyendu Bhattacharya commented on SPARK-8591:
----------------------------------------------

This is the summary of the issue as mentioned in the PR .

Problem summary. If a block fails to unroll, the ReceiverTracker will never 
know about the block and will not include it in a future computation. In the 
mean time, however, the block may be replicated and take up space on other 
executors even though it will never be used.

Implications for Spark core. For Spark core, however, it is reasonable to 
replicate a block even if it fails to unroll. Just because there is not enough 
memory to cache this block on this executor doesn't mean the same is true on a 
different executor. This is all best effort, but a future computation of the 
RDD will still benefit from having the cached block somewhere. (Note: the 
existing code doesn't actually do this for normal RDD caching yet because 
CacheManager has its own unrolling logic. We will address this separately in 
the future.)

Alternative fix. The right fix for SPARK-8591 would be to have the 
ReceiverTracker just read its blocks from the BlockManagerMaster. This 
simplifies the two divergent block reporting code paths. Since the 
BlockManagerMaster is notified of replicated blocks, the replication here will 
also help mitigate data loss in the case of MEMORY_ONLY_*.

TL;DR. This patch removes a small feature from block manager that, though not 
currently used, is desirable in the future for both Spark core and Spark 
streaming. However, the underlying issue is not caused by a bug in the block 
manager, but an incorrect assumption in the ReceiverTracker that doesn't take 
into account replication. The correct way forward would be to fix this in Spark 
streaming by refactoring the ReceiverTracker to depend on BlockManagerMaster.

> Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 
> StorageLevel
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-8591
>                 URL: https://issues.apache.org/jira/browse/SPARK-8591
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.4.0
>            Reporter: Dibyendu Bhattacharya
>
> Block which failed to unroll to memory and returned iterator and size 0, 
> should not be replicated to peer node as putBlockStatus comes as 
> StorageLevel.NONE and BlockStatus is not reported to Master.
> Primary issue here is , for StorageLevel  MEMORY_ONLY_2 , if BlockManager 
> failed to unroll the block to memory and store to local is failed, 
> BlockManager still replicate the same block to Remote peer. For Spark 
> Streaming case , the Receivers get the PutResult from local BlockManager and 
> if block failed to store locally , ReceivedBlockHandler throws the 
> SparkException back to Receiver even though the block successfully replicated 
> in Remote peer by BlockManager. This leads to wastage of memory at remote 
> peer as that block can never be used in Streaming jobs. In case of Receiver 
> failed to store the block, it can retry and for every failed retry ( to store 
> to local) may leads to adding another unused block to remote and this may 
> leads to many unwanted blocks in case of high volume receivers does multiple 
> retry. 
> The fix here proposed is to stop replicating the block if store to local has 
> failed. This fix will prevent the scenario mentioned above and also will not 
> impact the RDD Partition replications ( during Cache or Persists) as RDD 
> CacheManager perform unrolling to memory first before attempting to store in 
> local memory, and this can never happen that block unroll is successful but 
> store to local memory has failed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to