This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new be73e4f171a Add AGENTS.md to common AI provider (#62824)
be73e4f171a is described below

commit be73e4f171a6bed7810c0ae1860281405db8a4dc
Author: Kaxil Naik <[email protected]>
AuthorDate: Tue Mar 3 22:08:17 2026 +0000

    Add AGENTS.md to common AI provider (#62824)
---
 providers/common/ai/AGENTS.md | 60 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/providers/common/ai/AGENTS.md b/providers/common/ai/AGENTS.md
new file mode 100644
index 00000000000..2c068d70891
--- /dev/null
+++ b/providers/common/ai/AGENTS.md
@@ -0,0 +1,60 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Common AI Provider — Agent Instructions
+
+This provider wraps [pydantic-ai](https://ai.pydantic.dev/) to connect Airflow 
pipelines to LLMs.
+The hook is a thin bridge between Airflow connections and pydantic-ai's 
model/provider abstractions.
+
+## Design Principles
+
+- **Delegate to pydantic-ai.** pydantic-ai supports 20+ providers (OpenAI, 
Anthropic, Google, Azure,
+  Bedrock, Ollama, etc.) via `infer_model()` and provider classes like 
`AzureProvider`, `BedrockProvider`.
+  Do not re-implement provider-specific logic that pydantic-ai handles.
+  Before writing new code, check: https://ai.pydantic.dev/models/
+- **Keep the hook thin.** `PydanticAIHook.get_conn()` maps Airflow connection 
fields to pydantic-ai
+  constructors. That is the hook's entire job. Do not add abstraction layers 
(builders, factories,
+  registries, Protocols) on top of pydantic-ai's own abstractions.
+- **No premature abstraction.** Do not add Protocols, builder patterns, or 
plugin systems for a single
+  code path. Wait until there are 3+ concrete use cases before introducing an 
abstraction.
+- **Operators stay focused.** Each operator does one thing: `LLMOperator` 
(prompt → output),
+  `LLMBranchOperator` (prompt → branch decision), `LLMSQLOperator` (prompt → 
validated SQL).
+
+## Adding Support for a New LLM Provider
+
+If pydantic-ai already supports the provider (check [models 
docs](https://ai.pydantic.dev/models/)):
+
+1. **Do nothing in this package.** Users set the `provider:model` string in 
their connection
+   (e.g. `azure:gpt-4o`, `bedrock:anthropic.claude-sonnet-4-20250514`) and the 
hook resolves it
+   via `infer_model()`.
+2. If the provider needs credentials beyond `api_key` and `base_url`, add a 
branch in
+   `get_conn()` using pydantic-ai's own provider class (e.g. `AzureProvider`).
+3. Update the connection form docs if new fields are needed.
+
+If pydantic-ai does *not* support the provider, contribute upstream to 
pydantic-ai rather than
+building a wrapper here.
+
+## Security
+
+- **No dynamic imports from connection extras.** Never use 
`importlib.import_module()` on
+  user-provided strings from connection fields. Connection extras are editable 
by users with
+  connection-edit permissions and must not become a code execution vector.
+- **SQL validation is on by default.** `LLMSQLOperator` validates generated 
SQL via AST parsing.
+  Do not disable this default.
+
+## Pitfalls
+
+- Do not construct raw provider SDK clients (e.g. `openai.AsyncAzureOpenAI`) — 
use pydantic-ai's
+  provider classes which handle client construction internally.
+- Do not add provider-specific connection types. The single `pydantic_ai` 
connection type works for
+  all providers via the `provider:model` format.
+- Use `from airflow.providers.common.compat.sdk import ...` for SDK imports, 
never
+  `from airflow.sdk import ...` directly.
+
+## Key Paths
+
+- Hook: `src/airflow/providers/common/ai/hooks/pydantic_ai.py`
+- Operators: `src/airflow/providers/common/ai/operators/`
+- Decorators: `src/airflow/providers/common/ai/decorators/`
+- Tests: `tests/unit/common/ai/`
+- Docs: `docs/`

Reply via email to