One thing to consider is how "JobManager" would no longer accurately
reflect the responsibilities of that pod (in K8s).

I think it is very difficult to rename that component but I did want to
point out how the responsibilities of the JobManager are at a higher level
than Job with this change.

Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>


On Sun, Sep 28, 2025 at 2:31 AM Lei Yang <[email protected]> wrote:

> Thank you Yi for your replay, looks good to me!
>
> +1 for this proposal
>
> Best,
> Lei
>
> Yi Zhang <[email protected]> 于2025年9月26日周五 10:43写道:
>
> > Hi Lei,
> >
> >
> > Thank you for the feedback! I really appreciate you sharing these great
> > questions and I would like to clarify my thinking:
> >
> >
> > 1. Handling FINISHED jobs in FAILING state
> > The FAILING state is designed to close active components, so
> > already-FINISHED jobs are intentionally left untouched. This keeps the
> > state transitions clean and simple.
> >
> > 2. Application HA and RESTARTING state
> > This is a very interesting point. Application HA in the follow-up tasks
> is
> > primarily centered around recovering from a JobManager failure (e.g., due
> > to a machine crash). In that scenario, the JobManager itself is
> > unavailable, making it impossible to update or query the application's
> > status.
> >
> >
> > However, you've brought up another excellent use case: automatically
> > restarting an application in response to a failed job (or other errors in
> > the main execution logic). This would be a powerful mechanism to build
> > resilience against transient issues like network instability. For this
> > scenario, you are absolutely right. Introducing a RESTARTING state for
> > application would be both reasonable and necessary to clearly indicate to
> > the user that a recovery attempt is in progress.
> > This capability seems like an important enhancement to application
> > management and may involve significant work. To keep the scope of the
> > current FLIP focused, I propose we don't include this functionality for
> > now.
> > If you are interested, I would be very happy to discuss this feature
> > further in a separate thread. I think it's a great direction for future
> > work.
> >
> >
> >
> >
> > Best Regards,
> >
> > Yi
> >
> >
> > At 2025-09-25 17:32:10, "Lei Yang" <[email protected]> wrote:
> > >Hi Yi, thanks for creating this FLIP!
> > >
> > >I'm trying to understand your FLIP. By introducing the Application
> entity,
> > >you're able to organically organize jobs, making them easier to observe
> > >and manage. This is great work!
> > >
> > >I'd like to share some questions with you, and hope you could help me
> > >clarify them:
> > >
> > >1. When an application is in the FAILING state, how are the jobs that
> have
> > >already reached the FINISHED state handled? Will they simply be ignored,
> > >or will there be other actions taken?
> > >
> > >2. In the "Follow-up Tasks", you mentioned high availability for the
> > >application,
> > >which will restart failed jobs to restore the application. However, I
> > >didn't see the
> > >description of the application's status during such restarts in the
> FLIP.
> > I
> > >think
> > >we might need to introduce a RESTARTING status to explicitly indicate
> the
> > >application is in the process of restarting?
> > >
> > >Best,
> > >Lei
> > >
> > >Yi Zhang <[email protected]> 于2025年9月23日周二 11:24写道:
> > >
> > >> Hi everyone,
> > >>
> > >>
> > >> I would like to start a discussion about FLIP-549: Support Application
> > >> Management [1].
> > >>
> > >>
> > >> Despite Flink’s widespread adoption, the existing model for running
> user
> > >> logic limits observability and execution flexibility, which affects
> user
> > >> experience. This FLIP introduces a new application management
> framework
> > >> designed to close these gaps and provide a foundation for future
> > >> improvements.
> > >>
> > >>
> > >> Looking forward to your feedback and suggestions.
> > >>
> > >>
> > >>
> > >> [1]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-549%3A+Support+Application+Management
> > >>
> > >>
> > >> Best regards,
> > >>
> > >> Yi Zhang
> >
>

Reply via email to