Matthias Pohl created FLINK-31709:
-------------------------------------
Summary: JobResultStore and ExecutionGraphInfoStore could be merged
Key: FLINK-31709
URL: https://issues.apache.org/jira/browse/FLINK-31709
Project: Flink
Issue Type: New Feature
Components: Runtime / Coordination
Reporter: Matthias Pohl
This is a initial proposal for an improvement in coordination layer:
The {{JobResultStore}} (JRS) was introduced as part of
[FLIP-194|https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+introduce+the+jobresultstore].
For now, it only stores the JobResult. Through the JRS, jobs can be marked as
finished even when the JobManager fails and the information from the
{{ExecutionGraphInfoStore}} is lost (see FLINK-11813).
While implementing {{FLIP-194}}, it became apparent, that we have some
redundancy between the JRS and the {{ExecutionGraphInfoStore}}. Both components
store some meta information of a finished job. The {{ExecutionGraphInfoStore}}
is used to make information about the finished job available in user-facing
APIs (REST, web-UI). The JRS is used to expose the job's state to the cleanup
logic and stores limited data.
This proposal is about merging the two and making the
{{ArchivedExecutionGraph}} information available even after a JobManager is
restarted. That way, completed jobs can be still listed in the job overview
after a Flink cluster restart. Additionally, we could provide the last
checkpoint information. The JRS would be a way to access this information even
after the Flink cluster is shut down. The latter feature would be also a way to
improve the Flink Kubernetes Operator's latest-state handling.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)