kaxil opened a new pull request, #68372:
URL: https://github.com/apache/airflow/pull/68372

   `durable=True` caches each model response and tool result under positional 
keys 
([`model_step_{N}`](https://github.com/apache/airflow/blob/57b427449b2cbbf777a1c4cf4eb877a8c4724324/providers/common/ai/src/airflow/providers/common/ai/durable/caching_model.py#L68-L75),
 
[`tool_step_{N}`](https://github.com/apache/airflow/blob/57b427449b2cbbf777a1c4cf4eb877a8c4724324/providers/common/ai/src/airflow/providers/common/ai/durable/caching_toolset.py#L62-L69)),
 and on a hit it returned the cached entry without ever looking at the current 
request. So a retry after the operator tweaked the system prompt (or upgraded 
the model, or a deploy changed the toolset) replayed responses recorded for the 
old conversation against the new agent. No error, nothing above DEBUG in the 
logs, just a wrong answer that looks fine. Changing something before retrying 
is the normal human workflow, which is what makes this the common path rather 
than an edge case.
   
   The fix stores a fingerprint with each cache entry and only replays when it 
matches the current request:
   
   - model steps hash the model identity, the message history (minus the 
`timestamp`/`run_id`/`conversation_id` fields pydantic-ai regenerates on every 
attempt), the settings, and the whole `ModelRequestParameters`, so tool 
definitions and output mode are covered too
   - tool steps hash the tool name, the args, and the model-issued 
`tool_call_id`
   
   On mismatch the step logs a warning and runs live. The `tool_call_id` part 
does more work than it looks: ids round-trip through the cache unchanged, but a 
live model call mints new ones, so once a model step diverges the downstream 
tool entries stop matching as well. I kept the positional keys from #64199 
instead of switching to content-addressed ones; verify-on-hit gets the same 
invalidation chain without changing the storage layout.
   
   A few decisions worth flagging for review:
   
   - if a request can't be serialized to JSON, the fingerprint is `None` and 
that step replays unverified, i.e. the old behavior. I specifically avoided 
`default=str` in the digest: hashing `<object at 0x...>` reprs would never 
match across processes, which quietly turns replay off forever while the 
warning blames the user for changing the agent.
   - entries written by older provider versions carry no fingerprint and are 
treated as a miss. A provider upgrade is itself a deploy landing between 
attempts, so re-running once is the right call even though it costs tokens.
   - verification compares requests, not code. Fixing a tool's implementation 
between attempts won't invalidate a cached result for an identical call, and 
neither will repointing `llm_conn_id` at a different endpoint serving the same 
model name. Both documented in the operator guide along with the 
delete-the-cache-file escape hatch.
   
   Verified end to end in breeze with a real DAG: two `AgentOperator` tasks on 
pydantic-ai's built-in `TestModel`, each with a tool that fails on attempt 1 
only. The unchanged task logged `Durable: replayed 2 cached steps (1 model, 1 
tool), executed 2 new steps (1 model, 1 tool)` on attempt 2, so only the failed 
step re-ran. The second task templates its prompt on `try_number` so the 
request changes per attempt; its retry fired the new warning and replayed 0 
model steps. The cache file was gone after the run succeeded.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to