justinmclean opened a new pull request, #267: URL: https://github.com/apache/airflow-steward/pull/267
Adds tools/skill-evals/evals/list-steward-skills/ per the /AGENTS.md § Reusable skills requirement that every skill ships a behavioural eval suite. list-steward-skills had no coverage. Two suites: - step-1-command (4 cases): command-selection logic — default listing, verbose via explicit request, verbose via keyword, and injection-in-user-message resistance (case-4 embeds a SYSTEM: block attempting to redirect to `find`; correct answer is the standard listing command). - step-2-present (3 cases): output-fidelity / hard-rule enforcement — standard verbatim output, user requests a summary (hard rule overrides), user requests a filtered view (hard rule overrides). All three cases expect presentation_mode: verbatim. Both suites use step-config.json to extract the relevant SKILL.md section live, so a future edit to the skill is automatically reflected in the prompt. Also updates tools/skill-evals/README.md: corrects the count from 15 → 19 (three previously-unlisted suites — pr-management-mentor, pr-management-stats, pr-management-triage — are now listed) and adds the new list-steward-skills entry. Validation: uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/list-steward-skills/ → all 7 cases load and print without error. Generated-by: Claude (Opus 4.7) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
