Hi Ryan,

Thanks for the feedback! I'm glad that you find the FLIP interesting. As you 
noted, it introduces a significant shift, with the core goal of providing users 
with observability and execution flexibility at a higher level (application) -- 
beyond jobs.
Your point about the UI navigation is very well-taken. Let me share the 
original thinking behind the design. The intended primary user flow is for 
users to land on a list of applications. They would then click on an 
application to navigate to its details page, which contains the list of its 
corresponding jobs. This flow is designed to inherently reflect the 
hierarchical relationship.
The reason I kept the separate "Jobs" tab in the navigation was that some users 
might not have an immediate need for the higher-level application context and 
would prefer a quick way to view a flat list of all jobs.
However, if it obscures the primary hierarchy, I agree we should address it. 
Maybe removing the "Jobs" tab seems like a solution to enforce clarity. Or do 
you have alternative improvements in mind?
Looking forward to your suggestions.



Best Regards,
Yi

At 2025-09-25 12:41:22, "Ryan van Huuksloot" 
<[email protected]> wrote:
>Hi Yi,
>
>Interesting FLIP, thanks for putting it together. Overall it would be good
>to unify the dispatcher for the different modes - although this will be a
>big lift.
>
>One small question I had was in relation to the hierarchy in the UI.
>Specifically the left navigation. At first glance it wasn't clear to me as
>a user that Applications and Jobs are tied together from the navigation. I
>saw that you can click on an application to get its corresponding jobs but
>the navigation still gives me pause. Maybe there is a UI change we can make
>to clear up any potential confusion?
>
>Ryan van Huuksloot
>Staff Engineer, Infrastructure | Streaming Platform
>[image: Shopify]
><https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
>
>
>On Wed, Sep 24, 2025 at 11:48 PM Yi Zhang <[email protected]> wrote:
>
>> Hi Shengkai,
>>
>>
>>
>>
>> Thanks for taking the time to review the FLIP and for your thoughtful and
>> constructive feedback! I would like to share the planned updates based on
>> your points:
>>
>>
>>
>> 1. Asynchronous REST API for Application Submission
>>
>> This is an excellent point. To maintain compatibility, the existing
>> /jars/:jarid/run API will retain its synchronous behavior, returning a Job
>> ID only after the user's main method has completed and the job has been
>> submitted. However, as you pointed out, this can lead to long response
>> times and poor user experience.
>>
>> Therefore, we plan to introduce a new asynchronous REST API, likely
>> /jars/:jarid/run-application. This API will submit the application and
>> return an Application ID for status polling immediately, without waiting
>> for the main method or job submission to finish. I'll add this to the
>> proposal.
>>
>>
>>
>> 2. Clarification on "Pre-termination Cleanup"
>>
>> Thank you for pointing out the ambiguity. I will update the document to
>> clarify that "pre-termination cleanup" refers to the process where an
>> application, before transitioning to a terminal state, will actively cancel
>> all the jobs it manages and wait for them to reach their own terminal
>> states. This ensures that the application's lifecycle is cohesively tied to
>> the lifecycle of the jobs it owns.
>>
>>
>>
>> 3. Potential Job Leak Prevention
>>
>> You've raised a critical concern here. As described in the point above,
>> the primary mechanism is that the application itself ensures its jobs are
>> terminated before it shuts down, which should prevent leaks in normal
>> circumstances.
>>
>> The question then becomes how to handle the exceptional case where a job
>> fails to respond to a cancellation request. Upon further reflection, I
>> believe that if a job is unresponsive to a cancellation initiated by the
>> application, a background monitor issuing the same request would likely
>> face the same problem.
>>
>> Therefore, maybe triggering a fatal error is a more appropriate action in
>> this scenario. While a fatal error in a session cluster could affect other
>> running applications, an unresponsive job indicates a severe underlying
>> issue that warrants such a drastic measure to prevent an inconsistent and
>> unpredictable system state. I will update the proposal to detail this
>> fault-tolerance strategy and the reasoning behind it.
>>
>>
>>
>> 4. API Compatibility Considerations
>>
>> Ensuring a smooth transition for existing users is a top priority and I
>> can confirm that all existing Job ID-based REST APIs will remain fully
>> functional.
>>
>> Users will still be able to query and cancel jobs launched via this new
>> application framework using Job IDs. I will add a specific section in the
>> document to explicitly state this, reassuring users that their existing
>> tools and scripts will continue to work as expected.
>>
>>
>>
>> Once again, thank you for your invaluable input. I will incorporate these
>> changes into the document shortly. Please let me know if you have any
>> further questions or suggestions.
>>
>>
>>
>> Best regards,
>>
>> Yi
>>
>> At 2025-09-24 16:59:52, "Shengkai Fang" <[email protected]> wrote:
>> >Hi, Yi.
>> >
>> >The FLIP is both interesting and highly promising for Flink users. Once
>> >implemented, it will enable powerful use cases—such as running a Jupyter
>> >Notebook kernel or SQL Gateway as a first-class application within the
>> >JobManager. This represents a significant step forward in usability and
>> >integration.
>> >
>> >I’d like to share a few suggestions and clarifications that could help
>> >strengthen the proposal:
>> >
>> >*Asynchronous REST API for Application Submission*
>> >
>> >Given that launching such applications may involve complex initialization
>> >and take considerable time to complete, it would be beneficial to support
>> >an asynchronous submission mechanism via REST. A synchronous endpoint
>> might
>> >lead to timeouts or poor user experience. An async API could return an
>> >application ID immediately, allowing users to poll or query the status of
>> >the deployment using that identifier.
>> >
>> >*Clarification on "Pre-termination Cleanup"*
>> >The term pre-termination cleanup is mentioned several times in the
>> >document. Could you please elaborate on what this entails? Specifically,
>> >which resources are expected to be released, and at what point in the life
>> >cycle does this occur? A clearer definition would help ensure consistent
>> >implementation and improve reliability.
>> >
>> >*Potential Job Leak Prevention*
>> >
>> >There appears to be a risk of job leaks if an application fails to
>> properly
>> >cancel its associated Flink job upon termination. To mitigate this, we
>> >might consider introducing a background daemon thread (or a monitoring
>> >service) that periodically checks for orphaned jobs whose parent
>> >applications have already terminated, and automatically triggers cleanup.
>> >Alternatively, integrating with Flink’s existing lifecycle management
>> >mechanisms could help ensure robust resource cleanup.
>> >
>> >*API Compatibility Considerations*
>> >
>> >It would be helpful to clarify how the new application model aligns with
>> >existing APIs. Many external systems currently rely on job IDs to monitor
>> >or cancel jobs. Will these operations still be supported under the new
>> >model? For example, can users continue to use the existing REST endpoints
>> >to cancel a job or check its status using the job ID, even when the job
>> was
>> >launched through this new application framework?
>> >
>> >
>> >Best,
>> >Shengkai
>> >
>> >Yi Zhang <[email protected]> 于2025年9月23日周二 11:23写道:
>> >
>> >> Hi everyone,
>> >>
>> >>
>> >> I would like to start a discussion about FLIP-549: Support Application
>> >> Management [1].
>> >>
>> >>
>> >> Despite Flink’s widespread adoption, the existing model for running user
>> >> logic limits observability and execution flexibility, which affects user
>> >> experience. This FLIP introduces a new application management framework
>> >> designed to close these gaps and provide a foundation for future
>> >> improvements.
>> >>
>> >>
>> >> Looking forward to your feedback and suggestions.
>> >>
>> >>
>> >>
>> >> [1]
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-549%3A+Support+Application+Management
>> >>
>> >>
>> >> Best regards,
>> >>
>> >> Yi Zhang
>>

Reply via email to