Github user brad-kaiser commented on the issue:

    https://github.com/apache/spark/pull/19041
  
    Hi @squito,
    
    The back and forth communication between CacheRecoveryManager and the 
BlockManagerMasterEndpoint is so that we always have an up to date view of what 
executors are undergoing cache recovery and we don't replicate blocks to those 
executors. If you look at recoverLatestBlock, we include the contents of the 
recoveringExecutors cache. 
    
    We could conceivably move that cache into the block manager master 
endpoint, but I think that would end up being messier. I wanted to keep all the 
cache recovery code localized and not clutter up Block Manager Master Endpoint. 
CacheRecoveryManager and BlockManagerMaster Endpoint will also be local to the 
same process so rpc calls between them should be cheap, especially compared to 
the time it will take to copy blocks around. 
    
    I will look into the race between removing the block and replicating the 
next block. 
    
    Thanks
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to