The GitHub Actions job "Tests (AMD)" on airflow.git/main has failed. Run started by GitHub user potiuk (triggered by bugraoz93).
Head commit for run: db6ce848baf69fc74e0d1d6337da6b84ff749f3b / Vikram Koka <[email protected]> Improve AIP progress tracker example for accuracy (#68037) * Improve AIP progress tracker example DAG to produce accurate, evidence-backed results The example DAG was producing hallucinated output -- fabricated completion percentages, invented blockers, and missed shipped work -- because the evidence pipeline was too thin and the prompts too permissive. Key changes: - Add AIP registry with Confluence page IDs, GitHub search aliases, and codebase directory paths for multi-strategy evidence gathering - Fetch GitHub file tree (Git Trees API) for codebase-level evidence - Replace flat 3000-char spec truncation with section-aware parsing - Replace completion_pct/blockers Pydantic model with per-deliverable DeliverableStatus (name, status, evidence, confidence) - Add grounding rules to analysis/synthesis/validation system prompts - Add three-layer quality pipeline: AI validation (LLMOperator) identifies ungrounded claims, deterministic apply_validation task does mechanical find-and-replace, human reviews the corrected report - Add arithmetic validation that cross-checks X/Y fractions against structured analysis data (catches validator-introduced errors) - Set temperature=0 on all LLM calls for run-to-run consistency * Add skills-based AIP tracker DAG alongside the pipeline version Same file now contains two DAGs that solve the same use case -- tracking AIP implementation progress -- with different architectures: 1. example_aip_progress_tracker (pipeline): 12-task deterministic pipeline with per-AIP LLM analysis, structured Pydantic output, AI validation, and arithmetic correction. More accurate, more auditable, fewer tokens (~66K total), but more complex. 2. example_aip_progress_tracker_skills (agent): Single AgentOperator with the aip-tracker skill loaded via AgentSkillsToolset plus custom tool functions for Confluence/GitHub APIs. Simpler DAG (2 tasks), but less control over output discipline (~82K tokens, coarser granularity). The aip-tracker SKILL.md bundle teaches the agent the same grounding rules the pipeline enforces structurally: spec-level deliverable granularity, fraction-only progress format, evidence-backed assessments, and a mandatory self-verification checklist. Also strengthens the pipeline DAG's arithmetic validation to cross-check per-AIP fractions and summary totals against structured analysis data. * Removed duplicate import and redundant definition Based on feedback from Kaxil, removed the duplicate import of re and resolved the redundant definition of _github_headers * Updated example to fix CI errors Fix mypy errors in AIP tracker skills DAG for _safe_api_get return type Narrow type guard from `isinstance(data, str)` to `not isinstance(data, dict)` so mypy recognizes that `.get()` calls are valid after the check, since `_safe_api_get` returns `dict | list | str`. Report URL: https://github.com/apache/airflow/actions/runs/27038684613 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
