Hi Till,

Thanks for driving this topic. I think this FLIP is very important to let us 
could enable local recovery [1] by default.

We previously also took similar method to make the working directory to let 
local state dir as the same as state-backend's local dir to ensure local 
recovery could well.

I noticed that this FLIP also want to make the working directory the same even 
process failure so that restarted processor could also take the old one. 
However, I think there might exist some problems in YARN environment. YARN 
would select all the local directories on different disks as the 'LOCAL_DIRS' 
to represent the "io.tmp.dirs" [2]. To allow the reuse of same old working 
directory, we need to always select the same directory from all disk candidates 
for the specific resource. Thus, we might need to store the working directory 
location persistently. If we use hash or similar method to calculate which 
directory would always be used as the working directory for specific 'resource 
id', it might meet problem if one of the disks is temporarily full or broken.



[1] https://issues.apache.org/jira/browse/FLINK-15507
[2] 
https://github.com/apache/flink/blob/cf1e8c39111378735e4c05a5edb3bd713229bb08/flink-core/src/main/java/org/apache/flink/configuration/CoreOptions.java#L363

Best
Yun Tang
________________________________
From: Till Rohrmann <trohrm...@apache.org>
Sent: Saturday, December 11, 2021 0:54
To: dev <dev@flink.apache.org>
Subject: [DISCUSS] FLIP-198: Working directory for Flink processes

Hi everyone,

I would like to start a discussion about introducing an explicit working
directory for Flink processes that can be used to store information [1].
Per default this working directory will reside in the temporary directory
of the node Flink runs on. However, if configured to reside on a persistent
volume, then this information can be used to recover from process/node
failures. Moreover, such a working directory can be used to consolidate
some of our other directories Flink creates under /tmp (e.g. blobStorage,
RocksDB working directory).

Here is a draft PR that outlines the required changes [2].

Looking forward to your feedback.

[1] https://cwiki.apache.org/confluence/x/ZZiqCw
[2] https://github.com/apache/flink/pull/18083

Cheers,
Till

Reply via email to