kaxil opened a new pull request, #68926:
URL: https://github.com/apache/airflow/pull/68926

   ## Summary
   
   `durable=True` on `AgentOperator` / `@task.agent` caches each model response 
and tool
   result so a retry replays completed steps instead of re-running expensive 
LLM calls.
   Until now that cache was always an ObjectStorage JSON file, which required
   `[common.ai] durable_cache_path` to be set.
   
   The durable cache is per-task-instance, per-run, and cleared on success, 
which is exactly
   the scope of the [AIP-103 task state 
store](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/task_state_store.py).
   On **Airflow >= 3.3** the cache now lives there:
   
   - No `durable_cache_path` configuration needed.
   - Large step payloads offload automatically when `[workers] 
state_store_backend` is set.
   - Cached steps are deleted on success, and reclaimed by the DAG-run cascade 
if a run fails permanently.
   
   On **Airflow < 3.3** the ObjectStorage backend is unchanged and selected 
automatically.
   
   ## Design
   
   - Both backends implement a shared `DurableStorageProtocol`, so 
`CachingModel` and
     `CachingToolset` depend on the interface, not a concrete backend.
   - `AgentOperator._build_durable_storage` picks the backend by Airflow 
version.
   - Durable keys use a reserved prefix (`__commonai_durable__...`) so they 
cannot collide
     with keys user code writes via `context["task_state_store"]` in the same 
task instance.
   - Entries are written with `NEVER_EXPIRE` so a retry can replay them 
regardless of
     `retry_delay` or the global retention config; `cleanup()` removes only the 
keys this run touched.
   
   ## Gotchas
   
   - On a default 3.3+ install (no `state_store_backend`), cached step values 
are stored in
     the metadata database. For agents with large responses, configure
     `[workers] state_store_backend` to offload them to external storage.
   - A tool passed via `agent_params={"tools": [...]}` is not wrapped by the 
caching toolset
     (only `toolsets=` are), so only its model steps replay. This is existing 
behavior, unchanged here.
   
   ## Verification
   
   Verified end-to-end on a real Airflow 3.3 standalone: a durable 
`AgentOperator` whose tool
   fails on the first attempt replays the cached model step from the task state 
store on the
   retry, and a direct round-trip task confirms cache, replay, and cleanup 
against the live
   Execution API and metadata DB.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to