vanzin commented on a change in pull request #24499: [SPARK-27677][Core] Serve local disk persisted blocks by the external service after releasing executor by dynamic allocation URL: https://github.com/apache/spark/pull/24499#discussion_r285741563
########## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala ########## @@ -149,29 +162,70 @@ class BlockManagerMasterEndpoint( // First remove the metadata for the given RDD, and then asynchronously remove the blocks // from the slaves. + // The message sent to the slaves to remove the RDD + val removeMsg = RemoveRdd(rddId) + // Find all blocks for the given RDD, remove the block from both blockLocations and - // the blockManagerInfo that is tracking the blocks. + // the blockManagerInfo that is tracking the blocks and create the futures which asynchronously + // remove the blocks from slaves and gives back the number of removed blocks val blocks = blockLocations.asScala.keys.flatMap(_.asRDDId).filter(_.rddId == rddId) + val blocksToDeleteByShuffleService = + new mutable.HashMap[BlockManagerId, mutable.HashSet[RDDBlockId]] + blocks.foreach { blockId => - val bms: mutable.HashSet[BlockManagerId] = blockLocations.get(blockId) - bms.foreach(bm => blockManagerInfo.get(bm).foreach(_.removeBlock(blockId))) - blockLocations.remove(blockId) + val bms: mutable.HashSet[BlockManagerId] = blockLocations.remove(blockId) + + val (bmIdExtShuffle, bmIdExecutor) = bms.partition(_.port == externalShuffleServicePort) + if (bmIdExecutor.isEmpty && bmIdExtShuffle.nonEmpty) { Review comment: Is this correct? What happens when you have a block with replication=2, for example, with one block in a live executor and one in a dead one? As far as I can tell you'd fail to remove the blocks from the dead executor's local dirs in that case. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org