[PR] fix: Verify durable cached agent steps match the request before replay [airflow]

via GitHub Wed, 10 Jun 2026 18:02:13 -0700


kaxil opened a new pull request, #68372:
URL: https://github.com/apache/airflow/pull/68372

`durable=True` caches each model response and tool result under positional
keys
([`model_step_{N}`](https://github.com/apache/airflow/blob/57b427449b2cbbf777a1c4cf4eb877a8c4724324/providers/common/ai/src/airflow/providers/common/ai/durable/caching_model.py#L68-L75),

[`tool_step_{N}`](https://github.com/apache/airflow/blob/57b427449b2cbbf777a1c4cf4eb877a8c4724324/providers/common/ai/src/airflow/providers/common/ai/durable/caching_toolset.py#L62-L69)),
and on a hit it returned the cached entry without ever looking at the current
request. So a retry after the operator tweaked the system prompt (or upgraded
the model, or a deploy changed the toolset) replayed responses recorded for the
old conversation against the new agent. No error, nothing above DEBUG in the
logs, just a wrong answer that looks fine. Changing something before retrying
is the normal human workflow, which is what makes this the common path rather
than an edge case.

The fix stores a fingerprint with each cache entry and only replays when it
matches the current request:

- model steps hash the model identity, the message history (minus the
`timestamp`/`run_id`/`conversation_id` fields pydantic-ai regenerates on every
attempt), the settings, and the whole `ModelRequestParameters`, so tool
definitions and output mode are covered too
- tool steps hash the tool name, the args, and the model-issued
`tool_call_id`

On mismatch the step logs a warning and runs live. The `tool_call_id` part
does more work than it looks: ids round-trip through the cache unchanged, but a
live model call mints new ones, so once a model step diverges the downstream
tool entries stop matching as well. I kept the positional keys from #64199
instead of switching to content-addressed ones; verify-on-hit gets the same
invalidation chain without changing the storage layout.

A few decisions worth flagging for review:

- if a request can't be serialized to JSON, the fingerprint is `None` and
that step replays unverified, i.e. the old behavior. I specifically avoided
`default=str` in the digest: hashing `<object at 0x...>` reprs would never
match across processes, which quietly turns replay off forever while the
warning blames the user for changing the agent.
- entries written by older provider versions carry no fingerprint and are
treated as a miss. A provider upgrade is itself a deploy landing between
attempts, so re-running once is the right call even though it costs tokens.
- verification compares requests, not code. Fixing a tool's implementation
between attempts won't invalidate a cached result for an identical call, and
neither will repointing `llm_conn_id` at a different endpoint serving the same
model name. Both documented in the operator guide along with the
delete-the-cache-file escape hatch.

Verified end to end in breeze with a real DAG: two `AgentOperator` tasks on
pydantic-ai's built-in `TestModel`, each with a tool that fails on attempt 1
only. The unchanged task logged `Durable: replayed 2 cached steps (1 model, 1
tool), executed 2 new steps (1 model, 1 tool)` on attempt 2, so only the failed
step re-ran. The second task templates its prompt on `try_number` so the
request changes per attempt; its retry fired the new warning and replayed 0
model steps. The cache file was gone after the run succeeded.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix: Verify durable cached agent steps match the request before replay [airflow]

Reply via email to