One thing to consider is how "JobManager" would no longer accurately reflect the responsibilities of that pod (in K8s).
I think it is very difficult to rename that component but I did want to point out how the responsibilities of the JobManager are at a higher level than Job with this change. Ryan van Huuksloot Staff Engineer, Infrastructure | Streaming Platform [image: Shopify] <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> On Sun, Sep 28, 2025 at 2:31 AM Lei Yang <[email protected]> wrote: > Thank you Yi for your replay, looks good to me! > > +1 for this proposal > > Best, > Lei > > Yi Zhang <[email protected]> 于2025年9月26日周五 10:43写道: > > > Hi Lei, > > > > > > Thank you for the feedback! I really appreciate you sharing these great > > questions and I would like to clarify my thinking: > > > > > > 1. Handling FINISHED jobs in FAILING state > > The FAILING state is designed to close active components, so > > already-FINISHED jobs are intentionally left untouched. This keeps the > > state transitions clean and simple. > > > > 2. Application HA and RESTARTING state > > This is a very interesting point. Application HA in the follow-up tasks > is > > primarily centered around recovering from a JobManager failure (e.g., due > > to a machine crash). In that scenario, the JobManager itself is > > unavailable, making it impossible to update or query the application's > > status. > > > > > > However, you've brought up another excellent use case: automatically > > restarting an application in response to a failed job (or other errors in > > the main execution logic). This would be a powerful mechanism to build > > resilience against transient issues like network instability. For this > > scenario, you are absolutely right. Introducing a RESTARTING state for > > application would be both reasonable and necessary to clearly indicate to > > the user that a recovery attempt is in progress. > > This capability seems like an important enhancement to application > > management and may involve significant work. To keep the scope of the > > current FLIP focused, I propose we don't include this functionality for > > now. > > If you are interested, I would be very happy to discuss this feature > > further in a separate thread. I think it's a great direction for future > > work. > > > > > > > > > > Best Regards, > > > > Yi > > > > > > At 2025-09-25 17:32:10, "Lei Yang" <[email protected]> wrote: > > >Hi Yi, thanks for creating this FLIP! > > > > > >I'm trying to understand your FLIP. By introducing the Application > entity, > > >you're able to organically organize jobs, making them easier to observe > > >and manage. This is great work! > > > > > >I'd like to share some questions with you, and hope you could help me > > >clarify them: > > > > > >1. When an application is in the FAILING state, how are the jobs that > have > > >already reached the FINISHED state handled? Will they simply be ignored, > > >or will there be other actions taken? > > > > > >2. In the "Follow-up Tasks", you mentioned high availability for the > > >application, > > >which will restart failed jobs to restore the application. However, I > > >didn't see the > > >description of the application's status during such restarts in the > FLIP. > > I > > >think > > >we might need to introduce a RESTARTING status to explicitly indicate > the > > >application is in the process of restarting? > > > > > >Best, > > >Lei > > > > > >Yi Zhang <[email protected]> 于2025年9月23日周二 11:24写道: > > > > > >> Hi everyone, > > >> > > >> > > >> I would like to start a discussion about FLIP-549: Support Application > > >> Management [1]. > > >> > > >> > > >> Despite Flink’s widespread adoption, the existing model for running > user > > >> logic limits observability and execution flexibility, which affects > user > > >> experience. This FLIP introduces a new application management > framework > > >> designed to close these gaps and provide a foundation for future > > >> improvements. > > >> > > >> > > >> Looking forward to your feedback and suggestions. > > >> > > >> > > >> > > >> [1] > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-549%3A+Support+Application+Management > > >> > > >> > > >> Best regards, > > >> > > >> Yi Zhang > > >
