Hi Shengkai,
Just a quick note, the FLIP has been updated to reflect the latest changes in the REST API and Application status description. Thanks, Yi At 2025-09-25 11:47:41, "Yi Zhang" <[email protected]> wrote: >Hi Shengkai, > > > > >Thanks for taking the time to review the FLIP and for your thoughtful and >constructive feedback! I would like to share the planned updates based on your >points: > > > >1. Asynchronous REST API for Application Submission > >This is an excellent point. To maintain compatibility, the existing >/jars/:jarid/run API will retain its synchronous behavior, returning a Job ID >only after the user's main method has completed and the job has been >submitted. However, as you pointed out, this can lead to long response times >and poor user experience. > >Therefore, we plan to introduce a new asynchronous REST API, likely >/jars/:jarid/run-application. This API will submit the application and return >an Application ID for status polling immediately, without waiting for the main >method or job submission to finish. I'll add this to the proposal. > > > >2. Clarification on "Pre-termination Cleanup" > >Thank you for pointing out the ambiguity. I will update the document to >clarify that "pre-termination cleanup" refers to the process where an >application, before transitioning to a terminal state, will actively cancel >all the jobs it manages and wait for them to reach their own terminal states. >This ensures that the application's lifecycle is cohesively tied to the >lifecycle of the jobs it owns. > > > >3. Potential Job Leak Prevention > >You've raised a critical concern here. As described in the point above, the >primary mechanism is that the application itself ensures its jobs are >terminated before it shuts down, which should prevent leaks in normal >circumstances. > >The question then becomes how to handle the exceptional case where a job fails >to respond to a cancellation request. Upon further reflection, I believe that >if a job is unresponsive to a cancellation initiated by the application, a >background monitor issuing the same request would likely face the same problem. > >Therefore, maybe triggering a fatal error is a more appropriate action in this >scenario. While a fatal error in a session cluster could affect other running >applications, an unresponsive job indicates a severe underlying issue that >warrants such a drastic measure to prevent an inconsistent and unpredictable >system state. I will update the proposal to detail this fault-tolerance >strategy and the reasoning behind it. > > > >4. API Compatibility Considerations > >Ensuring a smooth transition for existing users is a top priority and I can >confirm that all existing Job ID-based REST APIs will remain fully functional. > >Users will still be able to query and cancel jobs launched via this new >application framework using Job IDs. I will add a specific section in the >document to explicitly state this, reassuring users that their existing tools >and scripts will continue to work as expected. > > > >Once again, thank you for your invaluable input. I will incorporate these >changes into the document shortly. Please let me know if you have any further >questions or suggestions. > > > >Best regards, > >Yi > >At 2025-09-24 16:59:52, "Shengkai Fang" <[email protected]> wrote: >>Hi, Yi. >> >>The FLIP is both interesting and highly promising for Flink users. Once >>implemented, it will enable powerful use cases—such as running a Jupyter >>Notebook kernel or SQL Gateway as a first-class application within the >>JobManager. This represents a significant step forward in usability and >>integration. >> >>I’d like to share a few suggestions and clarifications that could help >>strengthen the proposal: >> >>*Asynchronous REST API for Application Submission* >> >>Given that launching such applications may involve complex initialization >>and take considerable time to complete, it would be beneficial to support >>an asynchronous submission mechanism via REST. A synchronous endpoint might >>lead to timeouts or poor user experience. An async API could return an >>application ID immediately, allowing users to poll or query the status of >>the deployment using that identifier. >> >>*Clarification on "Pre-termination Cleanup"* >>The term pre-termination cleanup is mentioned several times in the >>document. Could you please elaborate on what this entails? Specifically, >>which resources are expected to be released, and at what point in the life >>cycle does this occur? A clearer definition would help ensure consistent >>implementation and improve reliability. >> >>*Potential Job Leak Prevention* >> >>There appears to be a risk of job leaks if an application fails to properly >>cancel its associated Flink job upon termination. To mitigate this, we >>might consider introducing a background daemon thread (or a monitoring >>service) that periodically checks for orphaned jobs whose parent >>applications have already terminated, and automatically triggers cleanup. >>Alternatively, integrating with Flink’s existing lifecycle management >>mechanisms could help ensure robust resource cleanup. >> >>*API Compatibility Considerations* >> >>It would be helpful to clarify how the new application model aligns with >>existing APIs. Many external systems currently rely on job IDs to monitor >>or cancel jobs. Will these operations still be supported under the new >>model? For example, can users continue to use the existing REST endpoints >>to cancel a job or check its status using the job ID, even when the job was >>launched through this new application framework? >> >> >>Best, >>Shengkai >> >>Yi Zhang <[email protected]> 于2025年9月23日周二 11:23写道: >> >>> Hi everyone, >>> >>> >>> I would like to start a discussion about FLIP-549: Support Application >>> Management [1]. >>> >>> >>> Despite Flink’s widespread adoption, the existing model for running user >>> logic limits observability and execution flexibility, which affects user >>> experience. This FLIP introduces a new application management framework >>> designed to close these gaps and provide a foundation for future >>> improvements. >>> >>> >>> Looking forward to your feedback and suggestions. >>> >>> >>> >>> [1] >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-549%3A+Support+Application+Management >>> >>> >>> Best regards, >>> >>> Yi Zhang
