kaxil opened a new pull request, #67792:
URL: https://github.com/apache/airflow/pull/67792

   The `common.ai` provider runs Pydantic AI agents but emits no telemetry 
about what the agent did: `result.usage()` and the model name are computed and 
then dropped at DEBUG. This wires Pydantic AI's native OpenTelemetry 
instrumentation into the OTLP exporter Airflow core already configures under 
`[traces]`, so agent spans (agent run, model call, tool call, token usage) flow 
to whatever OpenTelemetry backend the deployment runs (Jaeger, Tempo, Grafana, 
Phoenix, Langfuse, a bare collector), correlated to the task that produced them.
   
   Scope is intentionally narrow: export only. No new metadata tables, no 
migration, no native trace store, no bundled UI. Viewing traces is the job of 
the backend the user already runs.
   
   ## How it works
   
   - Instrumentation is attached once, in `PydanticAIHook.create_agent()`, so 
every LLM operator (`AgentOperator`, `@task.agent` / `@task.llm`, and the SQL / 
branch / file-analysis / schema-compare operators) and `LLMRetryPolicy` are 
covered through a single chokepoint.
   - It reuses the global `TracerProvider` that core tracing installs; it never 
configures an exporter or provider of its own. If `[traces] otel_on` is off in 
the worker process, nothing is emitted and there is zero overhead.
   - Parenting is implicit: the worker opens the task span (from 
`TaskInstance.context_carrier`) before `execute()` runs, so the agent spans 
nest under it and inherit the task's `trace_id` and `airflow.*` attributes. The 
operator does not re-extract the carrier.
   - Span flushing is already handled by the worker's existing teardown, so no 
operator changes were needed.
   
   ## Design rationale
   
   - **The `agent.instrument` property, not `Agent(instrument=...)`**: the 
constructor kwarg is deprecated in current pydantic-ai; the property is the 
stable per-agent surface back to the `pydantic-ai-slim>=1.71` floor.
   - **`version=4`**: pins the OpenTelemetry GenAI semantic-convention 
attribute names (`gen_ai.*`) so they do not drift with the pydantic-ai default.
   - **Provider config, not core `[traces]`**: the transport is shared, but the 
decision to emit GenAI spans and what content to capture is provider-owned.
   
   ## Usage
   
   ```ini
   [traces]
   otel_on = True
   
   [common.ai]
   otel_export_enabled = True
   ```
   
   Point the exporter with the standard OpenTelemetry environment variables. 
For an OTLP/HTTP collector:
   
   ```bash
   export OTEL_TRACES_EXPORTER="otlp_proto_http"
   export 
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://otel-collector:4318/v1/traces";
   ```
   
   ## Content capture and privacy
   
   `capture_content` (default off) controls whether prompt, completion, and 
tool-call text is attached to spans. With it off, only token counts, model id, 
latency, tool names, and finish reason are recorded. When on, that content is 
exported to the tracing backend without redaction: Airflow's secret masking is 
a logging filter and does not apply to span attributes. Enable it only for 
debugging in a trusted environment.
   
   ## Deferred follow-ups
   
   - `otel_exporter_endpoint` to route GenAI spans to a different collector 
than infra traces (needs a second span processor on the provider).
   - A `redactor` hook to scrub captured content before export.
   - Tracing the non-pydantic-ai operators (`LlamaIndexRetrievalOperator`, 
`DocumentLoaderOperator`), which do not build through `create_agent()`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to