[ 
https://issues.apache.org/jira/browse/SPARK-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout updated SPARK-2571:
----------------------------------


https://github.com/apache/spark/commit/7b971b91caeebda57f1506ffc4fd266a1b379290

> Shuffle read bytes are reported incorrectly for stages with multiple shuffle 
> dependencies
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-2571
>                 URL: https://issues.apache.org/jira/browse/SPARK-2571
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 1.0.1, 0.9.3
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>             Fix For: 1.0.2
>
>
> In BlockStoreShuffleFetcher, we set the shuffle metrics for a task to include 
> information about data fetched from one BlockFetcherIterator.  When tasks 
> have multiple shuffle dependencies (e.g., a stage that joins two datasets 
> together), the metrics will get set based on data fetched from the last 
> BlockFetcherIterator to complete, rather than the sum of all data fetched 
> from all BlockFetcherIterators.  This can lead to dramatically underreporting 
> the shuffle read bytes.
> Thanks [~andrewor14] and [~rxin] for helping to diagnose this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to