Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/2851#issuecomment-74293360
  
    Hi @CodingCat ,
    
    thanks for working on this, sorry the review has been dragging out.  This 
will be a nice addition, but I have some concerns similar to the other 
reviewers -- but lemme explain in a little more detail.  I don't necessarily 
see anything wrong with your approach for getting the Broadcast block info by 
itself, but it seems like it ought to share a similar code path to tracking the 
RDDBlocks.  RDDBlocks get updated in the `SparkListenerTaskEnd` event, through 
`event.taskMetrics.updatedBlocks`.  this makes its way to the current 
`StorageTab` in 
[`StorageListener`](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala#L59),
 which then [filters down to the 
RDDBlocks](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala#L50)
 via `BlockId.asRDDId`.
    
    So, this makes me wonder about two possible alternatives for unifying the 
code paths:
    
    1) Can you get the Broadcast block ids into the same `SparkListenerTaskEnd` 
event, which would eliminate the need to create a new event?  (In fact, maybe 
those block ids are already there?  I am still looking into the code for how 
those events get created ...)
    
    2) If we can't get Broadcast block ids into the same 
`SparkListenerTaskEnd`, does it make sense to change the RDD block tracking 
logic to do the same thing?  Yes this will be a bigger change, but it seems 
strange to have these two blocks tracked by such different mechanisms so it 
will lead to simpler code overall.  (But really that pushes me to option #1)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to