Gaël Renoux created FLINK-27687:
-----------------------------------
Summary: SpanningWrapper shouldn't assume temp folder exists
Key: FLINK-27687
URL: https://issues.apache.org/jira/browse/FLINK-27687
Project: Flink
Issue Type: New Feature
Components: Runtime / Network
Affects Versions: 1.14.4
Reporter: Gaël Renoux
In SpanningWrapper.createSpillingChannel, it assumes that the folder in which
we create the file exists. However, this is not the case in the following
scenario (which actually happened to us today):
* The temp folders were created a while ago (I assume on startup of the
task-manager) in the /tmp folder. They weren't used for a while, probably
because we didn't have any record big enough to trigger it.
* The cleanup cron for /tmp did its job and deleted those old folders in /tmp.
* We deployed a new version of the job that actually needed the folders, and
it crashed.
=> Not sure if it should be SpanningWrapper's responsability to create the
folder if it doesn't exist anymore, though, but I'm not familiar enough with
Flink's internal to make a guess as to what class should do it. The problem
occurred to us on SpanningWrapper, but it can probably happen in other places
as well.
More generally, assuming that folders and files in /tmp won't get deleted at
some point doesn't seem correct to me. The [documentation for
io.tmp.dirs|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/]
recommands that it shouldn't be purged, but we do need to clean up at some
point. If that is not the case, then the documentation should be updated to
indicate that this is not a recommendation but mandatory, and that purges will
break the jobs (not just trigger a recovery).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)