Kay Ousterhout created SPARK-5845: ------------------------------------- Summary: Time to cleanup intermediate shuffle files not included in shuffle write time Key: SPARK-5845 URL: https://issues.apache.org/jira/browse/SPARK-5845 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 1.2.1, 1.3.0 Reporter: Kay Ousterhout Priority: Minor
When the disk is contended, I've observed cases when it takes as long as 7 seconds to clean up all of the intermediate spill files for a shuffle (when using the sort based shuffle, but bypassing merging because there are <=200 shuffle partitions). This is even when the shuffle data is non-huge (152MB written from one of the tasks where I observed this). This is effectively part of the shuffle write time (because it's a necessary side effect of writing data to disk) so should be added to the shuffle write time to facilitate debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org