[ 
https://issues.apache.org/jira/browse/FLINK-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaël Renoux updated FLINK-27687:
--------------------------------
    Summary: Flink shouldn't assume temp folders keep existing when unused  
(was: SpanningWrapper shouldn't assume temp folder exists)

> Flink shouldn't assume temp folders keep existing when unused
> -------------------------------------------------------------
>
>                 Key: FLINK-27687
>                 URL: https://issues.apache.org/jira/browse/FLINK-27687
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Network
>    Affects Versions: 1.14.4
>            Reporter: Gaël Renoux
>            Priority: Major
>
> In SpanningWrapper.createSpillingChannel, it assumes that the folder in which 
> we create the file exists. However, this is not the case in the following 
> scenario (which actually happened to us today):
>  * The temp folders were created a while ago (I assume on startup of the 
> task-manager) in the /tmp folder. They weren't used for a while, probably 
> because we didn't have any record big enough to trigger it.
>  * The cleanup cron for /tmp did its job and deleted those old folders in 
> /tmp.
>  * We deployed a new version of the job that actually needed the folders, and 
> it crashed.
> => Not sure if it should be SpanningWrapper's responsability to create the 
> folder if it doesn't exist anymore, though, but I'm not familiar enough with 
> Flink's internal to make a guess as to what class should do it. The problem 
> occurred to us on SpanningWrapper, but it can probably happen in other places 
> as well.
> More generally, assuming that folders and files in /tmp won't get deleted at 
> some point doesn't seem correct to me. The [documentation for 
> io.tmp.dirs|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/]
>  recommands that it shouldn't be purged, but we do need to clean up at some 
> point. If that is not the case, then the documentation should be updated to 
> indicate that this is not a recommendation but mandatory, and that purges 
> will break the jobs (not just trigger a recovery).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to