Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23207#discussion_r238633725
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
    @@ -92,6 +92,12 @@ private[spark] class ShuffleMapTask(
           threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime
         } else 0L
     
    +    // Register the shuffle write metrics reporter to shuffleWriteMetrics.
    +    if (dep.shuffleWriteMetricsReporter.isDefined) {
    +      
context.taskMetrics().shuffleWriteMetrics.registerExternalShuffleWriteReporter(
    --- End diff --
    
    a simpler idea:
    1. create a `class GroupedShuffleWriteMetricsReporter(reporters: 
Seq[ShuffleWriteMetricsReporter]) extends ShuffleWriteMetricsReporter`, which 
proxy all the metrics updating to the input reporters.
    2. create a `GroupedShuffleWriteMetricsReporter` instance here: `new 
GroupedShuffleWriteMetricsReporter(Seq(dep.shuffleWriteMetricsReporter.get, 
context.taskMetrics().shuffleWriteMetrics))`, and pass it to `manager.getWriter`
    
    I think we can use the same approach for read metrics as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to