[ 
https://issues.apache.org/jira/browse/SPARK-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335667#comment-14335667
 ] 

Patrick Wendell commented on SPARK-5845:
----------------------------------------

[~kayousterhout] did you mean the time required to delete spill files used 
during aggregation? For the shuffle files themselves, they are deleted 
asynchronously, as [~ilganeli] has mentioned.

> Time to cleanup intermediate shuffle files not included in shuffle write time
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-5845
>                 URL: https://issues.apache.org/jira/browse/SPARK-5845
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 1.3.0, 1.2.1
>            Reporter: Kay Ousterhout
>            Assignee: Ilya Ganelin
>            Priority: Minor
>
> When the disk is contended, I've observed cases when it takes as long as 7 
> seconds to clean up all of the intermediate spill files for a shuffle (when 
> using the sort based shuffle, but bypassing merging because there are <=200 
> shuffle partitions).  This is even when the shuffle data is non-huge (152MB 
> written from one of the tasks where I observed this).  This is effectively 
> part of the shuffle write time (because it's a necessary side effect of 
> writing data to disk) so should be added to the shuffle write time to 
> facilitate debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to