featzhang created FLINK-39627:
---------------------------------
Summary: Introduce flink-gpu-sidecar module with service skeleton
Key: FLINK-39627
URL: https://issues.apache.org/jira/browse/FLINK-39627
Project: Flink
Issue Type: Sub-task
Components: Build System, Runtime / Task
Reporter: featzhang
h2. Background
GPU-accelerated inference benefits significantly from keeping models
resident in GPU memory and amortising the cost across many requests. The
umbrella proposal introduces a long-lived sidecar process, co-located with
each GPU-enabled TaskManager, that owns the model and serves inference
requests over RPC. This sub-task establishes the module and the minimum
service skeleton, without yet implementing the actual inference path.
h2. Scope of this sub-task
* Add a new Maven module {{flink-gpu-sidecar}} with the standard Flink
build conventions (license headers, shade configuration, module
descriptor, NOTICE file).
* Define the configuration surface:
** {{sidecar.rpc.endpoint}} - bind address (UDS path or TCP host:port).
** {{sidecar.model.uri}} - location of the model to load at startup.
** {{sidecar.health.port}} - HTTP port exposing a {{/health}} endpoint.
* Provide a process entry point that: reads the config, exposes a
{{/health}} endpoint returning {{READY}} or {{NOT_READY}}, and blocks on
SIGTERM with graceful shutdown.
* Publish the empty RPC service surface (proto file + generated stubs)
containing only a {{Ping}} method. The inference method is added in the
next sub-task.
* Provide a script under {{flink-dist}} to start the sidecar in the
TaskManager's lifecycle directory, disabled by default.
h2. Out of scope
* No batching, no queueing, no real inference.
* No integration with any specific model format (that is carried by
concrete backends added later).
* No security / TLS (tracked separately).
h2. Acceptance criteria
* {{mvn -pl flink-gpu-sidecar -am verify}} passes.
* Starting the sidecar with a minimal configuration reaches {{READY}} state
within five seconds on a developer laptop.
* {{Ping}} RPC round-trips end-to-end in an integration test.
* Clean shutdown on SIGTERM within the configured grace period.
h2. Affected modules
* New: {{flink-gpu-sidecar}}
* {{flink-dist}} (opt-in launch script)
h2. Links
Parent: see umbrella issue linked to this sub-task.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)