[jira] [Commented] (SPARK-8591) Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 StorageLevel

2015-07-14 Thread Dibyendu Bhattacharya (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627517#comment-14627517
 ] 

Dibyendu Bhattacharya commented on SPARK-8591:
--

This will be won't fix as suggested by [~tdas] in PR

 Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 
 StorageLevel
 

 Key: SPARK-8591
 URL: https://issues.apache.org/jira/browse/SPARK-8591
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Dibyendu Bhattacharya

 Block which failed to unroll to memory and returned iterator and size 0, 
 should not be replicated to peer node as putBlockStatus comes as 
 StorageLevel.NONE and BlockStatus is not reported to Master.
 Primary issue here is , for StorageLevel  MEMORY_ONLY_2 , if BlockManager 
 failed to unroll the block to memory and store to local is failed, 
 BlockManager still replicate the same block to Remote peer. For Spark 
 Streaming case , the Receivers get the PutResult from local BlockManager and 
 if block failed to store locally , ReceivedBlockHandler throws the 
 SparkException back to Receiver even though the block successfully replicated 
 in Remote peer by BlockManager. This leads to wastage of memory at remote 
 peer as that block can never be used in Streaming jobs. In case of Receiver 
 failed to store the block, it can retry and for every failed retry ( to store 
 to local) may leads to adding another unused block to remote and this may 
 leads to many unwanted blocks in case of high volume receivers does multiple 
 retry. 
 The fix here proposed is to stop replicating the block if store to local has 
 failed. This fix will prevent the scenario mentioned above and also will not 
 impact the RDD Partition replications ( during Cache or Persists) as RDD 
 CacheManager perform unrolling to memory first before attempting to store in 
 local memory, and this can never happen that block unroll is successful but 
 store to local memory has failed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8591) Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 StorageLevel

2015-07-14 Thread Dibyendu Bhattacharya (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627513#comment-14627513
 ] 

Dibyendu Bhattacharya commented on SPARK-8591:
--

This is the summary of the issue as mentioned in the PR .

Problem summary. If a block fails to unroll, the ReceiverTracker will never 
know about the block and will not include it in a future computation. In the 
mean time, however, the block may be replicated and take up space on other 
executors even though it will never be used.

Implications for Spark core. For Spark core, however, it is reasonable to 
replicate a block even if it fails to unroll. Just because there is not enough 
memory to cache this block on this executor doesn't mean the same is true on a 
different executor. This is all best effort, but a future computation of the 
RDD will still benefit from having the cached block somewhere. (Note: the 
existing code doesn't actually do this for normal RDD caching yet because 
CacheManager has its own unrolling logic. We will address this separately in 
the future.)

Alternative fix. The right fix for SPARK-8591 would be to have the 
ReceiverTracker just read its blocks from the BlockManagerMaster. This 
simplifies the two divergent block reporting code paths. Since the 
BlockManagerMaster is notified of replicated blocks, the replication here will 
also help mitigate data loss in the case of MEMORY_ONLY_*.

TL;DR. This patch removes a small feature from block manager that, though not 
currently used, is desirable in the future for both Spark core and Spark 
streaming. However, the underlying issue is not caused by a bug in the block 
manager, but an incorrect assumption in the ReceiverTracker that doesn't take 
into account replication. The correct way forward would be to fix this in Spark 
streaming by refactoring the ReceiverTracker to depend on BlockManagerMaster.

 Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 
 StorageLevel
 

 Key: SPARK-8591
 URL: https://issues.apache.org/jira/browse/SPARK-8591
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Dibyendu Bhattacharya

 Block which failed to unroll to memory and returned iterator and size 0, 
 should not be replicated to peer node as putBlockStatus comes as 
 StorageLevel.NONE and BlockStatus is not reported to Master.
 Primary issue here is , for StorageLevel  MEMORY_ONLY_2 , if BlockManager 
 failed to unroll the block to memory and store to local is failed, 
 BlockManager still replicate the same block to Remote peer. For Spark 
 Streaming case , the Receivers get the PutResult from local BlockManager and 
 if block failed to store locally , ReceivedBlockHandler throws the 
 SparkException back to Receiver even though the block successfully replicated 
 in Remote peer by BlockManager. This leads to wastage of memory at remote 
 peer as that block can never be used in Streaming jobs. In case of Receiver 
 failed to store the block, it can retry and for every failed retry ( to store 
 to local) may leads to adding another unused block to remote and this may 
 leads to many unwanted blocks in case of high volume receivers does multiple 
 retry. 
 The fix here proposed is to stop replicating the block if store to local has 
 failed. This fix will prevent the scenario mentioned above and also will not 
 impact the RDD Partition replications ( during Cache or Persists) as RDD 
 CacheManager perform unrolling to memory first before attempting to store in 
 local memory, and this can never happen that block unroll is successful but 
 store to local memory has failed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8591) Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 StorageLevel

2015-06-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599541#comment-14599541
 ] 

Apache Spark commented on SPARK-8591:
-

User 'dibbhatt' has created a pull request for this issue:
https://github.com/apache/spark/pull/6990

 Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 
 StorageLevel
 

 Key: SPARK-8591
 URL: https://issues.apache.org/jira/browse/SPARK-8591
 Project: Spark
  Issue Type: Bug
  Components: Block Manager
Affects Versions: 1.4.0
Reporter: Dibyendu Bhattacharya

 Block which failed to unroll to memory and returned iterator and size 0, 
 should not be replicated to peer node as putBlockStatus comes as 
 StorageLevel.NONE and BlockStatus is not reported to Master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org