[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138354#comment-14138354
 ] 

Sandy Ryza commented on SPARK-3577:
-----------------------------------

In the old code, the ShuffleWriteMetrics didn't get passed into the disk 
writer.  The disk writer would track the bytes it wrote itself and how long it 
took to write them and these would be used to create the shuffle metrics.  
ExternalSorter didn't interact with ShuffleWriteMetrics at all.  With the 
change, ExternalSorter creates a ShuffleWriteMetrics because it's now what's 
used to find how many bytes were written from the disk writer.  It might make 
sense to rename ShuffleWriteMetrics to something like WriteMetrics or 
DiskWriteMetrics, though I think this was discussed and rejected in the initial 
patch.

> Shuffle write time incorrect for sort-based shuffle
> ---------------------------------------------------
>
>                 Key: SPARK-3577
>                 URL: https://issues.apache.org/jira/browse/SPARK-3577
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Kay Ousterhout
>            Priority: Blocker
>
> After this change 
> https://github.com/apache/spark/commit/4e982364426c7d65032e8006c63ca4f9a0d40470
>  (cc [~sandyr] [~pwendell]) the ExternalSorter passes its own 
> ShuffleWriteMetrics into ExternalSorter.  The write time recorded in those 
> metrics is never used -- meaning that when someone is using sort-based 
> shuffle, the shuffle write time will be recorded as 0.
> [~sandyr] do you have time to take a look at this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to