[ https://issues.apache.org/jira/browse/FLINK-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gaël Renoux updated FLINK-27687: -------------------------------- Summary: Flink shouldn't assume temp folders keep existing when unused (was: SpanningWrapper shouldn't assume temp folder exists) > Flink shouldn't assume temp folders keep existing when unused > ------------------------------------------------------------- > > Key: FLINK-27687 > URL: https://issues.apache.org/jira/browse/FLINK-27687 > Project: Flink > Issue Type: New Feature > Components: Runtime / Network > Affects Versions: 1.14.4 > Reporter: Gaël Renoux > Priority: Major > > In SpanningWrapper.createSpillingChannel, it assumes that the folder in which > we create the file exists. However, this is not the case in the following > scenario (which actually happened to us today): > * The temp folders were created a while ago (I assume on startup of the > task-manager) in the /tmp folder. They weren't used for a while, probably > because we didn't have any record big enough to trigger it. > * The cleanup cron for /tmp did its job and deleted those old folders in > /tmp. > * We deployed a new version of the job that actually needed the folders, and > it crashed. > => Not sure if it should be SpanningWrapper's responsability to create the > folder if it doesn't exist anymore, though, but I'm not familiar enough with > Flink's internal to make a guess as to what class should do it. The problem > occurred to us on SpanningWrapper, but it can probably happen in other places > as well. > More generally, assuming that folders and files in /tmp won't get deleted at > some point doesn't seem correct to me. The [documentation for > io.tmp.dirs|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/] > recommands that it shouldn't be purged, but we do need to clean up at some > point. If that is not the case, then the documentation should be updated to > indicate that this is not a recommendation but mandatory, and that purges > will break the jobs (not just trigger a recovery). -- This message was sent by Atlassian Jira (v8.20.7#820007)