Thanks Till for creating this FLIP.

I believe the feature is really useful for standalone K8s deployment with
persist volume. For native K8s and Yarn deployment,
Flink ResourceManager will create a new TaskManager with new resource id.
So we still could not benefit from this FLIP.

Moreover, I am curious about the clean-up mechanism. Will the working
directory be deleted once the Flink job reached
globally terminal state? Or it needs to be deleted externally.


Best,
Yang


Yun Tang <myas...@live.com> 于2021年12月11日周六 下午10:48写道:

> Hi Till,
>
> Thanks for driving this topic. I think this FLIP is very important to let
> us could enable local recovery [1] by default.
>
> We previously also took similar method to make the working directory to
> let local state dir as the same as state-backend's local dir to ensure
> local recovery could well.
>
> I noticed that this FLIP also want to make the working directory the same
> even process failure so that restarted processor could also take the old
> one. However, I think there might exist some problems in YARN environment.
> YARN would select all the local directories on different disks as the
> 'LOCAL_DIRS' to represent the "io.tmp.dirs" [2]. To allow the reuse of same
> old working directory, we need to always select the same directory from all
> disk candidates for the specific resource. Thus, we might need to store the
> working directory location persistently. If we use hash or similar method
> to calculate which directory would always be used as the working directory
> for specific 'resource id', it might meet problem if one of the disks is
> temporarily full or broken.
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-15507
> [2]
> https://github.com/apache/flink/blob/cf1e8c39111378735e4c05a5edb3bd713229bb08/flink-core/src/main/java/org/apache/flink/configuration/CoreOptions.java#L363
>
> Best
> Yun Tang
> ________________________________
> From: Till Rohrmann <trohrm...@apache.org>
> Sent: Saturday, December 11, 2021 0:54
> To: dev <dev@flink.apache.org>
> Subject: [DISCUSS] FLIP-198: Working directory for Flink processes
>
> Hi everyone,
>
> I would like to start a discussion about introducing an explicit working
> directory for Flink processes that can be used to store information [1].
> Per default this working directory will reside in the temporary directory
> of the node Flink runs on. However, if configured to reside on a persistent
> volume, then this information can be used to recover from process/node
> failures. Moreover, such a working directory can be used to consolidate
> some of our other directories Flink creates under /tmp (e.g. blobStorage,
> RocksDB working directory).
>
> Here is a draft PR that outlines the required changes [2].
>
> Looking forward to your feedback.
>
> [1] https://cwiki.apache.org/confluence/x/ZZiqCw
> [2] https://github.com/apache/flink/pull/18083
>
> Cheers,
> Till
>

Reply via email to