Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23207
  
    Can you share some ideas about it? IMO shuffle write metrics is hard, as an 
RDD can have shuffle dependencies with multiple upstream RDDs. That said, in 
general the shuffle write metrics should belong to the upstream RDDs.
    
    In Spark SQL, it's a little simpler, as the `ShuffledRowRDD` always have 
only one child, so it's reasonable to say that shuffle write metrics belong to 
`ShuffledRowRDD`.
    
    That said, we need to design a not-so-general shuffle write metrics API in 
Spark core, which will only be used in Spark SQL.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to