Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23207 Can you share some ideas about it? IMO shuffle write metrics is hard, as an RDD can have shuffle dependencies with multiple upstream RDDs. That said, in general the shuffle write metrics should belong to the upstream RDDs. In Spark SQL, it's a little simpler, as the `ShuffledRowRDD` always have only one child, so it's reasonable to say that shuffle write metrics belong to `ShuffledRowRDD`. That said, we need to design a not-so-general shuffle write metrics API in Spark core, which will only be used in Spark SQL.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org