[GitHub] [spark] ivoson commented on a diff in pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

via GitHub Sun, 12 Feb 2023 04:32:18 -0800


ivoson commented on code in PR #39459:
URL: https://github.com/apache/spark/pull/39459#discussion_r1103794439



##########
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala:
##########
@@ -77,6 +77,11 @@ class BlockManagerMasterEndpoint(
   // Mapping from block id to the set of block managers that have the block.
   private val blockLocations = new JHashMap[BlockId, 
mutable.HashSet[BlockManagerId]]
 
+  // Mapping from task id to the set of rdd blocks which are generated from 
the task.
+  private val tidToRddBlockIds = new mutable.HashMap[Long, 
mutable.HashSet[RDDBlockId]]
+  // Record the visible RDD blocks which have been generated at least from one 
successful task.
+  private val visibleRDDBlocks = new mutable.HashSet[RDDBlockId]

Review Comment:
   > If existing blocks are lost - why would you need that information as they 
are gone? In other words, how is it different from today's situation (without 
visibility) - if a block is lost, it is no longer in system.
   
   Here is an example for the scenario I am trying to describe:
   1. we have a cached block rdd_1_1 which has been successfully cached and 
marked as visible.
   2. the cached block got lost due to executor lost;
   3. another task on rdd1 got submitted and the 1st attempt failed after 
putting the cache block rdd_1_1, for the 2nd attempts, things could be 
different here:
       a. if we still have the visiblily status, the 2nd attempt can use the 
cached block directly;
       b otherwise, we still need to do the computing.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ivoson commented on a diff in pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

Reply via email to