damccorm opened a new pull request, #39074:
URL: https://github.com/apache/beam/pull/39074
## Problem
Currently, `ADKAgentModelHandler` only supports deploying and calling remote
model endpoints. This limits its usability in scenarios where the user wants to
deploy a local model (e.g., using a local PyTorch or HuggingFace model) on the
Beam workers.
## Solution
This PR extends `ADKAgentModelHandler` to allow deploying a local model
serving process instead of relying on a remote model endpoint.
Key changes:
1. **Generic SubProcess Wrapper**: Introduced `SubProcessModel` (inherits
from `SubprocessModelHandler`) in `apache_beam/ml/inference/base.py`. This
wrapper can wrap *any* standard `ModelHandler` and serve it via a FastAPI
server running in a subprocess.
2. **FastAPI Subprocess Server**: Added `subprocess_server.py` which runs
in the subprocess. It exposes:
* OpenAI-compatible `/v1/chat/completions` and `/v1/completions`
endpoints (for compatibility with ADK's `LiteLlm` client).
* A transparent pickle-based binary `/v1/beam/inference` endpoint to
preserve exact input/output types when used directly in Beam pipelines.
3. **ADK Integration**: Updated `ADKAgentModelHandler` to:
* Accept an `underlying_model_handler` (which must be a
`SubprocessModelHandler`, such as `SubProcessModel` or
`VLLMCompletionsModelHandler`).
* Manage the lifecycle of the local server process.
* Propagate the local model configuration (endpoint base URL with
dynamically selected port) to the root agent, subagents, and tools.
* Handle dynamic port updates if the server restarts.
* Implement recovery/connectivity checks on inference failures.
4. **Refactoring**: Refactored `vllm_inference.py` to use a new generic
`_VLLMBaseModelHandler` to eliminate code duplication between completions and
chat handlers.
5. **Robustness & Safety**:
* Improved temporary directory lifecycle management to prevent
premature cleanup by GC.
* Bound the subprocess server to `127.0.0.1` instead of `0.0.0.0` for
security.
* Improved error propagation and logging for both client and server
sides.
## Verification
* Added comprehensive unit tests in `base_test.py` (`SubProcessModelTest`)
and `agent_development_kit_test.py` (`TestLocalModelIntegration`).
* Verified all tests pass.
TAG=agy
CONV=b2f450cf-8b63-49a9-8ee8-430ed676e116
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]