[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

cloud-fan Tue, 04 Dec 2018 18:13:30 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23207#discussion_r238909822
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala ---
    @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter {
       private[spark] def decBytesWritten(v: Long): Unit
       private[spark] def decRecordsWritten(v: Long): Unit
     }
    +
    +
    +/**
    + * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics 
updating to the input
    + * reporters.
    + */
    +private[spark] class GroupedShuffleWriteMetricsReporter(
    --- End diff --
    
    For the write metrics, it's different. It's the default one calls the SQL 
one, which needs to hack the default one to register external reporters.
    
    Maybe we should not change the read side, just create a special 
`PairShuffleWriteMetricsReporter` to update both the SQL reporter and default 
reporter.
    
    Another idea is, `ShuffleDependency` carries a `reporter => reporter` 
function, instead of a reporter. Then we can create a SQL reporter which takes 
another reporter(similar to read side), and put the SQL reporter's constructor 
in `ShuffleDependency`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

Reply via email to