Re: [PR] Add embedded NVIDIA Dynamo support to vLLM ModelHandler [beam]

via GitHub Wed, 27 May 2026 12:23:51 -0700


akshayjadiyanv commented on PR #38701:
URL: https://github.com/apache/beam/pull/38701#issuecomment-4557933117


   Thanks Danny. Pushed `6f8849169b2`: `vllm.dockerfile.old` is bumped to 
`apache-beam[gcp]==2.71.0` + matching `COPY` source, with `ai-dynamo[vllm]` and 
`etcd v3.5.13` added. `common.gradle`'s Dynamo IT is uncommented; it now 
inherits `n1-standard-4` from `argMap` (like the existing vLLM ITs) and uses 
`nvidia-tesla-t4` per your suggestion. No separate GPU pool — per-job 
`worker_accelerator` experiment.
   
   Validated end-to-end on Dataflow against an image built from the updated 
dockerfile, with `--sdk_container_image=<my-image>`. Job 
`2026-05-27_10_43_15-18102219387511428172` finished `JOB_STATE_DONE` with 5 
coherent Qwen3-0.6B completions in GCS. The `nvext.timing` field on every 
`Completion` confirms the Dynamo frontend served the request (vanilla vLLM's 
OpenAI server doesn't emit it). Worker logs confirm Tesla T4 attached.
   
   **Ask**: when convenient, please rebuild 
`us.gcr.io/apache-beam-testing/python-postcommit-it/vllm:latest` from the 
updated dockerfile so the new Dynamo IT in `common.gradle` (which reuses that 
tag) can actually run in `apache-beam-testing`. Happy to coordinate on a staged 
tag first if you'd prefer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add embedded NVIDIA Dynamo support to vLLM ModelHandler [beam]

Reply via email to