[ 
https://issues.apache.org/jira/browse/SPARK-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dibyendu Bhattacharya updated SPARK-8591:
-----------------------------------------
    Description: 
Block which failed to unroll to memory and returned iterator and size 0, should 
not be replicated to peer node as putBlockStatus comes as StorageLevel.NONE and 
BlockStatus is not reported to Master.

Primary issue here is , for StorageLevel  MEMORY_ONLY_2 , if BlockManager 
failed to unroll the block to memory and store to local is failed, BlockManager 
still replicate the same block to Remote peer. For Spark Streaming case , the 
Receivers get the PutResult from local BlockManager and if block failed to 
store locally , ReceivedBlockHandler throws the SparkException back to Receiver 
even though the block successfully replicated in Remote peer by BlockManager. 
This leads to wastage of memory at remote peer as that block can never be used 
in Streaming jobs. In case of Receiver failed to store the block, it can retry 
and for every failed retry ( to store to local) may leads to adding another 
unused block to remote and this may leads to many unwanted blocks in case of 
high volume receivers does multiple retry. 

The fix here proposed is to stop replicating the block if store to local has 
failed. This fix will prevent the scenario mentioned above and also will not 
impact the RDD Partition replications ( during Cache or Persists) as RDD 
CacheManager perform unrolling to memory first before attempting to store in 
local memory, and this can never happen that block unroll is successful but 
store to local memory has failed. 



  was:
Block which failed to unroll to memory and returned iterator and size 0, should 
not be replicated to peer node as putBlockStatus comes as StorageLevel.NONE and 
BlockStatus is not reported to Master.

Primary issue here is , for StorageLevel  MEMORY_ONLY_2 , if BlockManager 
failed to unroll the block to memory and store to local is failed, BlockManager 
still replicate the same block to Remote peer. For Spark Streaming case , the 
Receivers get the PutResult from local BlockManager and if block failed to 
store , Receiver ReceivedBlockHandler throws the SparkException back to 
Receiver even though the block successfully replicated in Remote peer by 
BlockManager. This leads to wastage of memory at remote peer as that block can 
never be used in Streaming jobs. In case of Receiver failed to store the block, 
it can retry and for every failed retry ( to store to local) may leads to 
adding another unused block to remote and this may leads to many unwanted 
blocks in case of high volume receivers does multiple retry. 

The fix here proposed is to stop replicating the block if store to local has 
failed. This fix will prevent the scenario mentioned above and also will not 
impact the RDD Partition replications ( during Cache or Persists) as RDD 
CacheManager perform unrolling to memory first before attempting to store in 
local memory, and this can never happen that block unroll is successful but 
store to local memory has failed. 




> Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 
> StorageLevel
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-8591
>                 URL: https://issues.apache.org/jira/browse/SPARK-8591
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 1.4.0
>            Reporter: Dibyendu Bhattacharya
>
> Block which failed to unroll to memory and returned iterator and size 0, 
> should not be replicated to peer node as putBlockStatus comes as 
> StorageLevel.NONE and BlockStatus is not reported to Master.
> Primary issue here is , for StorageLevel  MEMORY_ONLY_2 , if BlockManager 
> failed to unroll the block to memory and store to local is failed, 
> BlockManager still replicate the same block to Remote peer. For Spark 
> Streaming case , the Receivers get the PutResult from local BlockManager and 
> if block failed to store locally , ReceivedBlockHandler throws the 
> SparkException back to Receiver even though the block successfully replicated 
> in Remote peer by BlockManager. This leads to wastage of memory at remote 
> peer as that block can never be used in Streaming jobs. In case of Receiver 
> failed to store the block, it can retry and for every failed retry ( to store 
> to local) may leads to adding another unused block to remote and this may 
> leads to many unwanted blocks in case of high volume receivers does multiple 
> retry. 
> The fix here proposed is to stop replicating the block if store to local has 
> failed. This fix will prevent the scenario mentioned above and also will not 
> impact the RDD Partition replications ( during Cache or Persists) as RDD 
> CacheManager perform unrolling to memory first before attempting to store in 
> local memory, and this can never happen that block unroll is successful but 
> store to local memory has failed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to