[GH] (airflow/main): Workflow run "Tests (AMD)" failed!

GitBox Fri, 05 Jun 2026 17:51:47 -0700


The GitHub Actions job "Tests (AMD)" on airflow.git/main has failed.
Run started by GitHub user potiuk (triggered by bugraoz93).


Head commit for run:
db6ce848baf69fc74e0d1d6337da6b84ff749f3b / Vikram Koka <[email protected]>
Improve AIP progress tracker example for accuracy (#68037)

* Improve AIP progress tracker example DAG to produce accurate, evidence-backed 
results

The example DAG was producing hallucinated output -- fabricated completion
percentages, invented blockers, and missed shipped work -- because the
evidence pipeline was too thin and the prompts too permissive.

Key changes:
- Add AIP registry with Confluence page IDs, GitHub search aliases, and
  codebase directory paths for multi-strategy evidence gathering
- Fetch GitHub file tree (Git Trees API) for codebase-level evidence
- Replace flat 3000-char spec truncation with section-aware parsing
- Replace completion_pct/blockers Pydantic model with per-deliverable
  DeliverableStatus (name, status, evidence, confidence)
- Add grounding rules to analysis/synthesis/validation system prompts
- Add three-layer quality pipeline: AI validation (LLMOperator) identifies
  ungrounded claims, deterministic apply_validation task does mechanical
  find-and-replace, human reviews the corrected report
- Add arithmetic validation that cross-checks X/Y fractions against
  structured analysis data (catches validator-introduced errors)
- Set temperature=0 on all LLM calls for run-to-run consistency

* Add skills-based AIP tracker DAG alongside the pipeline version

Same file now contains two DAGs that solve the same use case -- tracking
AIP implementation progress -- with different architectures:

1. example_aip_progress_tracker (pipeline): 12-task deterministic pipeline
   with per-AIP LLM analysis, structured Pydantic output, AI validation,
   and arithmetic correction. More accurate, more auditable, fewer tokens
   (~66K total), but more complex.

2. example_aip_progress_tracker_skills (agent): Single AgentOperator with
   the aip-tracker skill loaded via AgentSkillsToolset plus custom tool
   functions for Confluence/GitHub APIs. Simpler DAG (2 tasks), but less
   control over output discipline (~82K tokens, coarser granularity).

The aip-tracker SKILL.md bundle teaches the agent the same grounding
rules the pipeline enforces structurally: spec-level deliverable
granularity, fraction-only progress format, evidence-backed assessments,
and a mandatory self-verification checklist.

Also strengthens the pipeline DAG's arithmetic validation to cross-check
per-AIP fractions and summary totals against structured analysis data.

* Removed duplicate import and redundant definition

Based on feedback from Kaxil, removed the duplicate import of re and resolved 
the redundant definition of _github_headers

* Updated example to fix CI errors

Fix mypy errors in AIP tracker skills DAG for _safe_api_get return type

  Narrow type guard from `isinstance(data, str)` to `not isinstance(data, dict)`
  so mypy recognizes that `.get()` calls are valid after the check, since
  `_safe_api_get` returns `dict | list | str`.

Report URL: https://github.com/apache/airflow/actions/runs/27038684613

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GH] (airflow/main): Workflow run "Tests (AMD)" failed!

Reply via email to