[PR] Fix setup-status eval prompts to surface decision rules [airflow-steward]

via GitHub Wed, 10 Jun 2026 00:39:59 -0700


justinmclean opened a new pull request, #484:
URL: https://github.com/apache/airflow-steward/pull/484


   ## Summary
   
   Add eval tests and discovered some bugs.
   
   The step-1-command and step-3-adjust-decision evals extracted skill
   sections that did not contain the rules for the fields under test, so
   the model guessed instead of following the skill:
   
   - step-1-command anchored on "Step 1 - Render the dashboard", which
     never mentions the flags; repoint it at "Inputs" (the flag table)
     and state in output-spec that no_adjust is independent of --format.
   - step-3-adjust-decision could not see the command-mapping rules
     (a peer "Step B" section) or the --no-adjust short-circuit (an intro
     above the anchor). Nest the mapping under Step A, renumber the old
     Step B/C, and add the short-circuit at the top of Step A.
   
   Document the single-section extraction behaviour in the eval README
   so future steps anchor step_heading at the section holding the rules.
   
   ## Type of change
   
   <!-- Tick all that apply. -->
   
   - [X] Skill change (`.claude/skills/<name>/`) — eval fixtures updated below
   - [ ] Tool / bridge contract (`tools/<system>/*.md`)
   - [ ] Python package (`tools/*/` with `pyproject.toml`)
   - [ ] Groovy reference impl
   - [ ] Cross-cutting (RFC, AGENTS.md, sandbox, privacy-LLM)
   - [ ] Documentation (`docs/`, `README.md`, `CONTRIBUTING.md`)
   - [ ] Project template (`projects/_template/`)
   - [ ] CI / dev loop (`prek`, workflows, validators)
   - [ ] Other:
   
   ## Test plan
   
   <!--
   How you verified the change. Be specific. The reviewer reads this to
   decide what to spot-check vs trust. Empty "ran the tests" doesn't help.
   -->
   
   - [X] `prek run --all-files` passes
   - [ ] For Python packages touched: `uv run pytest` / `ruff check` / `mypy` 
passes
   - [ ] For Groovy bridges touched: command-line invocation tested end-to-end
   - [X] For skill changes: eval suite passes for the affected skill
         (`PYTHONPATH=tools/skill-evals/src python3 -m skill_evals.runner 
tools/skill-evals/evals/<skill>/`)
   - [ ] For skill *behaviour* changes: a new or updated eval fixture is 
included in this PR
         (a regression test for the bug fixed / the behaviour added — see 
CONTRIBUTING.md)
   - [ ] Other:
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Fix setup-status eval prompts to surface decision rules [airflow-steward]

Reply via email to