kaxil commented on code in PR #68372:
URL: https://github.com/apache/airflow/pull/68372#discussion_r3424730362
##########
providers/common/ai/src/airflow/providers/common/ai/durable/caching_model.py:
##########
@@ -67,15 +73,33 @@ async def request(
) -> ModelResponse:
step = self.counter.next_step()
key = f"model_step_{step}"
+ fingerprint = fingerprint_model_request(
Review Comment:
Good catch, you're right. `prepare_request()` merges the model-level
`settings` and applies the profile transforms (thinking, native tools, output
mode) before the provider sees the request, so fingerprinting the raw
`request()` args misses a change that only lives at the model level -- e.g. a
different temperature on the connection between attempts. Fixed: the
fingerprint now comes from `self.wrapped.prepare_request(model_settings,
model_request_parameters)`. The raw args still go to `wrapped.request()`, which
re-runs `prepare_request()` itself; it's pure and idempotent so the second call
is safe. Added a regression test that a model-level settings change invalidates
the cached entry.
##########
providers/common/ai/src/airflow/providers/common/ai/durable/storage.py:
##########
@@ -99,32 +99,46 @@ def _save_cache(self) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(json.dumps(self._cache))
- def save_model_response(self, key: str, response: ModelResponse) -> None:
- """Serialize and store a ModelResponse in the cache."""
+ def save_model_response(
+ self, key: str, response: ModelResponse, *, fingerprint: str | None =
None
+ ) -> None:
+ """Serialize and store a ModelResponse with the request fingerprint
that produced it."""
cache = self._load_cache()
- cache[key] = ModelMessagesTypeAdapter.dump_json([response]).decode()
+ data = ModelMessagesTypeAdapter.dump_json([response]).decode()
+ cache[key] = json.dumps({"fingerprint": fingerprint, "data": data})
Review Comment:
No intention -- leftover from wrapping the original string-based format.
Fixed to store the dumped messages as native objects
(`ModelMessagesTypeAdapter.dump_python([response], mode="json")`) so the cache
is JSON-encoded once in `_save_cache` instead of double-encoding the response
payload. Did the same for tool-result entries, keeping a `json.dumps(result)`
probe there so a non-serializable result still skips just that entry instead of
breaking the whole-file write.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]