[ https://issues.apache.org/jira/browse/SPARK-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Or updated SPARK-8966: ----------------------------- Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-9697) > Design a mechanism to ensure that temporary files created in tasks are > cleaned up after failures > ------------------------------------------------------------------------------------------------ > > Key: SPARK-8966 > URL: https://issues.apache.org/jira/browse/SPARK-8966 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Josh Rosen > > It's important to avoid leaking temporary files, such as spill files created > by the external sorter. Individual operators should still make an effort to > clean up their own files / perform their own error handling, but I think that > we should add a safety-net mechanism to track file creation on a per-task > basis and automatically clean up leaked files. > During tests, this mechanism should throw an exception when a leak is > detected. In production deployments, it should log a warning and clean up the > leak itself. This is similar to the TaskMemoryManager's leak detection and > cleanup code. > We may be able to implement this via a convenience method that registers task > completion handlers with TaskContext. > We might also explore techniques that will cause files to be cleaned up > automatically when their file descriptors are closed (e.g. by calling unlink > on an open file). These techniques should not be our last line of defense > against file resource leaks, though, since they might be platform-specific > and may clean up resources later than we'd like. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org