[ https://issues.apache.org/jira/browse/SPARK-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kay Ousterhout updated SPARK-5845: ---------------------------------- Summary: Time to cleanup spilled shuffle files not included in shuffle write time (was: Time to cleanup intermediate shuffle files not included in shuffle write time) > Time to cleanup spilled shuffle files not included in shuffle write time > ------------------------------------------------------------------------ > > Key: SPARK-5845 > URL: https://issues.apache.org/jira/browse/SPARK-5845 > Project: Spark > Issue Type: Bug > Components: Shuffle > Affects Versions: 1.3.0, 1.2.1 > Reporter: Kay Ousterhout > Assignee: Ilya Ganelin > Priority: Minor > > When the disk is contended, I've observed cases when it takes as long as 7 > seconds to clean up all of the intermediate spill files for a shuffle (when > using the sort based shuffle, but bypassing merging because there are <=200 > shuffle partitions). This is even when the shuffle data is non-huge (152MB > written from one of the tasks where I observed this). This is effectively > part of the shuffle write time (because it's a necessary side effect of > writing data to disk) so should be added to the shuffle write time to > facilitate debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org