[ 
https://issues.apache.org/jira/browse/SPARK-49788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884785#comment-17884785
 ] 

Holden Karau commented on SPARK-49788:
--------------------------------------

Rough thoughts on how we could make a new and improved TTL mechanism:

In the BlockManagerMasterEndpoint when were asked for the locations of blocks 
if any of them are shuffle blocks we could reset a timer for that associated 
shuffle ID. I'm not sure where the timer should live – we could have it in the 
BlockManagerMasterEndpoint or we could keep it in the ContextCleaner.

 

One possible reason to keep it in the BlockManagerMasterEndpoint is we could 
potentially expose this information of last-access-time for dynamic scale down 
selection purposes (and maybe block migrations but those would probably be 
better served by executor side tracking of access).

 

> Add spark.cleaner.ttl functionality for long lived jobs
> -------------------------------------------------------
>
>                 Key: SPARK-49788
>                 URL: https://issues.apache.org/jira/browse/SPARK-49788
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 4.0.0
>            Reporter: Holden Karau
>            Assignee: Holden Karau
>            Priority: Major
>
> We should add a TTL (and maybe a threshold?) to clean shuffle files which 
> have been stored for greater than some fixed period of time. This would be 
> useful for long lived jobs with Spark Connect and Notebooks where items may 
> not go out-of-scope and be resident for longer than needed.
>  
> See SPARK-7689 which removed the original TTL cleaner.
>  
> To reduce the chance of confusion/difficulty to understand which was part of 
> the original reason for removing the TTL based cleaner we should (instead of 
> "raw" TTL) keep track of when it was accessed last to reset the counter + log 
> on removal.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to