featzhang created FLINK-39627:
---------------------------------

             Summary: Introduce flink-gpu-sidecar module with service skeleton
                 Key: FLINK-39627
                 URL: https://issues.apache.org/jira/browse/FLINK-39627
             Project: Flink
          Issue Type: Sub-task
          Components: Build System, Runtime / Task
            Reporter: featzhang


h2. Background

GPU-accelerated inference benefits significantly from keeping models
resident in GPU memory and amortising the cost across many requests. The
umbrella proposal introduces a long-lived sidecar process, co-located with
each GPU-enabled TaskManager, that owns the model and serves inference
requests over RPC. This sub-task establishes the module and the minimum
service skeleton, without yet implementing the actual inference path.

h2. Scope of this sub-task

* Add a new Maven module {{flink-gpu-sidecar}} with the standard Flink
 build conventions (license headers, shade configuration, module
 descriptor, NOTICE file).
* Define the configuration surface:
** {{sidecar.rpc.endpoint}} - bind address (UDS path or TCP host:port).
** {{sidecar.model.uri}} - location of the model to load at startup.
** {{sidecar.health.port}} - HTTP port exposing a {{/health}} endpoint.
* Provide a process entry point that: reads the config, exposes a
 {{/health}} endpoint returning {{READY}} or {{NOT_READY}}, and blocks on
 SIGTERM with graceful shutdown.
* Publish the empty RPC service surface (proto file + generated stubs)
 containing only a {{Ping}} method. The inference method is added in the
 next sub-task.
* Provide a script under {{flink-dist}} to start the sidecar in the
 TaskManager's lifecycle directory, disabled by default.

h2. Out of scope

* No batching, no queueing, no real inference.
* No integration with any specific model format (that is carried by
 concrete backends added later).
* No security / TLS (tracked separately).

h2. Acceptance criteria

* {{mvn -pl flink-gpu-sidecar -am verify}} passes.
* Starting the sidecar with a minimal configuration reaches {{READY}} state
 within five seconds on a developer laptop.
* {{Ping}} RPC round-trips end-to-end in an integration test.
* Clean shutdown on SIGTERM within the configured grace period.

h2. Affected modules

* New: {{flink-gpu-sidecar}}
* {{flink-dist}} (opt-in launch script)

h2. Links

Parent: see umbrella issue linked to this sub-task.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to