[ https://issues.apache.org/jira/browse/SPARK-49788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Holden Karau updated SPARK-49788: --------------------------------- Description: We should add a TTL (and maybe a threshold?) to clean shuffle files which have been stored for greater than some fixed period of time. This would be useful for long lived jobs with Spark Connect and Notebooks where items may not go out-of-scope and be resident for longer than needed. See SPARK-7689 which removed the original TTL cleaner. To reduce the chance of confusion/difficulty to understand which was part of the original reason for removing the TTL based cleaner we should (instead of "raw" TTL) keep track of when it was accessed last to reset the counter + log on removal. was: We should add a TTL (and maybe a threshold?) to clean shuffle files which have been stored for greater than some fixed period of time. This would be useful for long lived jobs with Spark Connect and Notebooks where items may not go out-of-scope and be resident for longer than needed. See SPARK-7689 which removed the original TTL cleaner. > Add spark.cleaner.ttl functionality for long lived jobs > ------------------------------------------------------- > > Key: SPARK-49788 > URL: https://issues.apache.org/jira/browse/SPARK-49788 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 4.0.0 > Reporter: Holden Karau > Assignee: Holden Karau > Priority: Major > > We should add a TTL (and maybe a threshold?) to clean shuffle files which > have been stored for greater than some fixed period of time. This would be > useful for long lived jobs with Spark Connect and Notebooks where items may > not go out-of-scope and be resident for longer than needed. > > See SPARK-7689 which removed the original TTL cleaner. > > To reduce the chance of confusion/difficulty to understand which was part of > the original reason for removing the TTL based cleaner we should (instead of > "raw" TTL) keep track of when it was accessed last to reset the counter + log > on removal. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org