[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

mridulm Fri, 12 Sep 2014 16:29:48 -0700

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/2366#issuecomment-55472830
  
    What happens when there is recomputation which results in same blockId 
getting regenerated (unpersist followed by recomputation/persist or block drop 
followed by recomputation or something else ) ? It will now go to some random 
node potentially not same as previously selected ? Resulting in 
over-replication ?
    
    A more corner case is if the computation was not idempotent ... and 
resulted in a changed dataset for the block - earlier it will get overwritten 
as part of replication : will we will now have two nodes with same data and a 
third (initially replicated to) which can diverge ?
    
    Btw, from what I saw, node loss is not handled right ? So a block can get 
under replicated ? Would be nice if we added that in some day ...
    
    
    Streaming is not the only application for replication :-) We use it in 
conjunction with locality wait levels to speed up computation when speculative 
execution is enabled.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

Reply via email to