Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19041#discussion_r178964472
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala 
---
    @@ -250,6 +255,44 @@ class BlockManagerMasterEndpoint(
         blockManagerIdByExecutor.get(execId).foreach(removeBlockManager)
       }
     
    +  private def recoverLatestRDDBlock(
    +      execId: String,
    +      excludeExecutors: Seq[String],
    +      context: RpcCallContext): Unit = {
    +    logDebug(s"Replicating first cached block on $execId")
    +    val excluded = excludeExecutors.flatMap(blockManagerIdByExecutor.get)
    +    val response: Option[Future[Boolean]] = for {
    +      blockManagerId <- blockManagerIdByExecutor.get(execId)
    +      info <- blockManagerInfo.get(blockManagerId)
    +      blocks = info.cachedBlocks.collect { case r: RDDBlockId => r }
    +      // As a heuristic, prioritize replicating the latest rdd. If this 
succeeds,
    +      // CacheRecoveryManager will try to replicate the remaining rdds.
    --- End diff --
    
    rather than the latest rdd, it would actually make more sense to take 
advantage of the LRU already in the MemoryStore: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L91
    
    but maybe that is not easy to expose.
    
    But I think that also means that the *receiving* end will put the 
replicated block at the back of that LinkedHashMap, even though it really 
hasn't been accessed at all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to