[ https://issues.apache.org/jira/browse/SPARK-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kay Ousterhout updated SPARK-2571: ---------------------------------- https://github.com/apache/spark/commit/7b971b91caeebda57f1506ffc4fd266a1b379290 > Shuffle read bytes are reported incorrectly for stages with multiple shuffle > dependencies > ----------------------------------------------------------------------------------------- > > Key: SPARK-2571 > URL: https://issues.apache.org/jira/browse/SPARK-2571 > Project: Spark > Issue Type: Bug > Components: Web UI > Affects Versions: 1.0.1, 0.9.3 > Reporter: Kay Ousterhout > Assignee: Kay Ousterhout > Fix For: 1.0.2 > > > In BlockStoreShuffleFetcher, we set the shuffle metrics for a task to include > information about data fetched from one BlockFetcherIterator. When tasks > have multiple shuffle dependencies (e.g., a stage that joins two datasets > together), the metrics will get set based on data fetched from the last > BlockFetcherIterator to complete, rather than the sum of all data fetched > from all BlockFetcherIterators. This can lead to dramatically underreporting > the shuffle read bytes. > Thanks [~andrewor14] and [~rxin] for helping to diagnose this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)