Thanks for creating this FLIP Matthias, Mika and David.

I think the JobResultStore is an important piece for fixing Flink's last
high-availability problem (afaik). Once we have this piece in place, users
no longer risk to re-execute a successfully completed job.

I have one comment concerning breaking interfaces:

If we don't want to break interfaces, then we could keep the
HighAvailabilityServices.getRunningJobsRegistry() method and add a default
implementation for HighAvailabilityServices.getJobResultStore(). We could
then deprecate the former method and then remove it in the subsequent
release (1.16).

Apart from that, +1 for the FLIP.

Cheers,
Till

On Wed, Nov 17, 2021 at 6:05 PM David Morávek <d...@apache.org> wrote:

> Hi everyone,
>
> Matthias, Mika and I want to start a discussion about introduction of a new
> Flink component, the *JobResultStore*.
>
> The main motivation is to address shortcomings of the *RunningJobsRegistry*
> and surpass it with the new component. These shortcomings have been first
> described in FLINK-11813 [1].
>
> This change should improve the overall stability of the JobManager's
> components and address the race conditions in some of the fail over
> scenarios during the job cleanup lifecycle.
>
> It should also help to ensure that Flink doesn't leave any uncleaned
> resources behind.
>
> We've prepared a FLIP-194 [2], which outlines the design and reasoning
> behind this new component.
>
> [1] https://issues.apache.org/jira/browse/FLINK-11813
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=195726435
>
> We're looking forward for your feedback ;)
>
> Best,
> Matthias, Mika and David
>

Reply via email to