justinmclean opened a new pull request, #251:
URL: https://github.com/apache/airflow-steward/pull/251

   > **Generated by the spec-driven build loop.** This skill and its eval suite
   > were produced by an autonomous run of `tools/spec-loop` (`./loop.sh` — one
   > work item, one branch, one PR), and this PR doubles as an end-to-end test 
of
   > that loop. Authored by Claude (see the `Generated-by` commit trailer) and
   > reviewed by a human before submission — please review accordingly.
   
   ## What
   
   Adds `pairing-self-review`, the first Pairing-mode skill: a strictly 
read-only
   pre-flight self-review that runs in the developer's own loop, after local
   changes are ready but before a PR is opened. It diffs the working branch 
against
   a configurable base (default: the merge base of `HEAD` and the upstream 
default
   branch), classifies findings across **correctness**, **security**, and
   **conventions** axes (each `blocking` or `advisory`), and returns a 
structured
   report. No state changes — it never opens a PR, pushes, comments, or mutates 
the
   working tree.
   
   Also flips the Pairing row in `docs/modes.md` from proposed/0 to 
experimental/1
   and adds the skill to the mode's table.
   
   ## Why
   
   Pre-flight self-review keeps implementation-detail chatter out of the 
eventual
   human review, so the maintainer conversation stays on design and trade-offs.
   It's the read-only counterpart to `pr-management-code-review`, which runs 
once a
   PR is already open.
   
   ## Changes
   
   - `.claude/skills/pairing-self-review/SKILL.md` — the skill.
   - `tools/skill-evals/evals/pairing-self-review/` — a 9-case eval suite.
   - `docs/modes.md` — Pairing row + skill table.
   
   ## Testing
   
   - `skill-validator` passes (no hard violations, no soft advisories).
   - All 9 eval cases assemble; the runner extracts each prompt live from
     `SKILL.md`, so a clean assembly confirms fixtures and skill text are in 
sync.
   - Exercised end-to-end against a real diff with planted issues: it caught a 
SQL
     injection, a correctness regression, and — when a comment in the diff 
ordered
     the reviewer to suppress findings — it treated the comment as data and 
flagged
     it rather than obeying. No false positives on clean changes.
   
   ## Notes
   
   - Ships with its eval suite, per the rule that a skill without one is 
incomplete.
   - Read-only by design: generating the report is the only action.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to