[ 
https://issues.apache.org/jira/browse/SPARK-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8966:
-----------------------------
    Issue Type: Improvement  (was: Sub-task)
        Parent:     (was: SPARK-9697)

> Design a mechanism to ensure that temporary files created in tasks are 
> cleaned up after failures
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-8966
>                 URL: https://issues.apache.org/jira/browse/SPARK-8966
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Josh Rosen
>
> It's important to avoid leaking temporary files, such as spill files created 
> by the external sorter.  Individual operators should still make an effort to 
> clean up their own files / perform their own error handling, but I think that 
> we should add a safety-net mechanism to track file creation on a per-task 
> basis and automatically clean up leaked files.
> During tests, this mechanism should throw an exception when a leak is 
> detected. In production deployments, it should log a warning and clean up the 
> leak itself.  This is similar to the TaskMemoryManager's leak detection and 
> cleanup code.
> We may be able to implement this via a convenience method that registers task 
> completion handlers with TaskContext.
> We might also explore techniques that will cause files to be cleaned up 
> automatically when their file descriptors are closed (e.g. by calling unlink 
> on an open file). These techniques should not be our last line of defense 
> against file resource leaks, though, since they might be platform-specific 
> and may clean up resources later than we'd like.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to