akshayjadiyanv commented on PR #38701: URL: https://github.com/apache/beam/pull/38701#issuecomment-4628241399
Thanks for pushing the image! The PostCommit ran, but didn't give a clean read on the Dynamo IT — and I don't think it's our test. The 3.12 job was **cancelled at the 4-hour cap, not failed**, and our Dynamo test never actually ran: `vllmTests` runs completion → chat → Dynamo sequentially, and only the first job (native `opt-125m`, job `2026-06-04_15_32_20-13766057237933605706`) was submitted — it hit `RUNNING` at 22:32 and never returned before the cap, so the Dynamo exec never started. 3.11/3.13 passed; 3.10 and 3.14 failed (3.14 was a `libpython3.14` segfault). Is PostCommit Python red on master too right now? Looks like it from the recent runs. I did validate the Dynamo path independently first: built an image from the updated `vllm.dockerfile.old` in my own GCP project and ran the example on Dataflow (T4, `Qwen3-0.6B`, `--use_dynamo`). It finished `JOB_STATE_DONE` with every `Completion` carrying an `nvext.timing` field — which only the Dynamo frontend emits, so the Dynamo path was definitely exercised. T4 confirmed in worker logs. A couple of **suggestions**, your call: - Run/observe just `vllmTests` (or the native `opt-125m` IT) in isolation — the exec-1 hang looks pre-existing. - Or reorder/split so the Dynamo run isn't starved behind it when the suite runs long. The job id above should let you pull the `apache-beam-testing` worker logs for why the native job didn't return. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
