Thank you Yi for your replay, looks good to me!

+1 for this proposal

Best,
Lei

Yi Zhang <[email protected]> 于2025年9月26日周五 10:43写道:

> Hi Lei,
>
>
> Thank you for the feedback! I really appreciate you sharing these great
> questions and I would like to clarify my thinking:
>
>
> 1. Handling FINISHED jobs in FAILING state
> The FAILING state is designed to close active components, so
> already-FINISHED jobs are intentionally left untouched. This keeps the
> state transitions clean and simple.
>
> 2. Application HA and RESTARTING state
> This is a very interesting point. Application HA in the follow-up tasks is
> primarily centered around recovering from a JobManager failure (e.g., due
> to a machine crash). In that scenario, the JobManager itself is
> unavailable, making it impossible to update or query the application's
> status.
>
>
> However, you've brought up another excellent use case: automatically
> restarting an application in response to a failed job (or other errors in
> the main execution logic). This would be a powerful mechanism to build
> resilience against transient issues like network instability. For this
> scenario, you are absolutely right. Introducing a RESTARTING state for
> application would be both reasonable and necessary to clearly indicate to
> the user that a recovery attempt is in progress.
> This capability seems like an important enhancement to application
> management and may involve significant work. To keep the scope of the
> current FLIP focused, I propose we don't include this functionality for
> now.
> If you are interested, I would be very happy to discuss this feature
> further in a separate thread. I think it's a great direction for future
> work.
>
>
>
>
> Best Regards,
>
> Yi
>
>
> At 2025-09-25 17:32:10, "Lei Yang" <[email protected]> wrote:
> >Hi Yi, thanks for creating this FLIP!
> >
> >I'm trying to understand your FLIP. By introducing the Application entity,
> >you're able to organically organize jobs, making them easier to observe
> >and manage. This is great work!
> >
> >I'd like to share some questions with you, and hope you could help me
> >clarify them:
> >
> >1. When an application is in the FAILING state, how are the jobs that have
> >already reached the FINISHED state handled? Will they simply be ignored,
> >or will there be other actions taken?
> >
> >2. In the "Follow-up Tasks", you mentioned high availability for the
> >application,
> >which will restart failed jobs to restore the application. However, I
> >didn't see the
> >description of the application's status during such restarts in the FLIP.
> I
> >think
> >we might need to introduce a RESTARTING status to explicitly indicate the
> >application is in the process of restarting?
> >
> >Best,
> >Lei
> >
> >Yi Zhang <[email protected]> 于2025年9月23日周二 11:24写道:
> >
> >> Hi everyone,
> >>
> >>
> >> I would like to start a discussion about FLIP-549: Support Application
> >> Management [1].
> >>
> >>
> >> Despite Flink’s widespread adoption, the existing model for running user
> >> logic limits observability and execution flexibility, which affects user
> >> experience. This FLIP introduces a new application management framework
> >> designed to close these gaps and provide a foundation for future
> >> improvements.
> >>
> >>
> >> Looking forward to your feedback and suggestions.
> >>
> >>
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-549%3A+Support+Application+Management
> >>
> >>
> >> Best regards,
> >>
> >> Yi Zhang
>

Reply via email to