Hi, Yi.

The FLIP is both interesting and highly promising for Flink users. Once
implemented, it will enable powerful use cases—such as running a Jupyter
Notebook kernel or SQL Gateway as a first-class application within the
JobManager. This represents a significant step forward in usability and
integration.

I’d like to share a few suggestions and clarifications that could help
strengthen the proposal:

*Asynchronous REST API for Application Submission*

Given that launching such applications may involve complex initialization
and take considerable time to complete, it would be beneficial to support
an asynchronous submission mechanism via REST. A synchronous endpoint might
lead to timeouts or poor user experience. An async API could return an
application ID immediately, allowing users to poll or query the status of
the deployment using that identifier.

*Clarification on "Pre-termination Cleanup"*
The term pre-termination cleanup is mentioned several times in the
document. Could you please elaborate on what this entails? Specifically,
which resources are expected to be released, and at what point in the life
cycle does this occur? A clearer definition would help ensure consistent
implementation and improve reliability.

*Potential Job Leak Prevention*

There appears to be a risk of job leaks if an application fails to properly
cancel its associated Flink job upon termination. To mitigate this, we
might consider introducing a background daemon thread (or a monitoring
service) that periodically checks for orphaned jobs whose parent
applications have already terminated, and automatically triggers cleanup.
Alternatively, integrating with Flink’s existing lifecycle management
mechanisms could help ensure robust resource cleanup.

*API Compatibility Considerations*

It would be helpful to clarify how the new application model aligns with
existing APIs. Many external systems currently rely on job IDs to monitor
or cancel jobs. Will these operations still be supported under the new
model? For example, can users continue to use the existing REST endpoints
to cancel a job or check its status using the job ID, even when the job was
launched through this new application framework?


Best,
Shengkai

Yi Zhang <[email protected]> 于2025年9月23日周二 11:23写道:

> Hi everyone,
>
>
> I would like to start a discussion about FLIP-549: Support Application
> Management [1].
>
>
> Despite Flink’s widespread adoption, the existing model for running user
> logic limits observability and execution flexibility, which affects user
> experience. This FLIP introduces a new application management framework
> designed to close these gaps and provide a foundation for future
> improvements.
>
>
> Looking forward to your feedback and suggestions.
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-549%3A+Support+Application+Management
>
>
> Best regards,
>
> Yi Zhang

Reply via email to