Re: [PR] contributor-activity-sweep skill with eval suite [airflow-steward]

via GitHub Mon, 25 May 2026 19:13:20 -0700


justinmclean commented on PR #228:
URL: https://github.com/apache/airflow-steward/pull/228#issuecomment-4539130129


   Pre-flight self-review — PR #228 (contributor-activity-sweep)
     
     https://github.com/apache/airflow-steward/pull/228 · draft · author: 
     justinmclean
   
     Base: main · Files changed: 37 (~all added) · Diff size: +756 / −0
   
     A new Pairing/Triage-flavoured read-only activity card with a 12-case eval
     suite (step-0 input validation × 4, step-1 review classification × 5, 
step-2
     render × 3), plus a 9-line tweak to contributor-nomination/render.md and a
     2-line tweak to tools/skill-evals/README.md.
   
     Correctness
   
     - [advisory] SKILL.md heading vs eval directory naming. The eval directory 
is
     step-1-classify-reviews, but its step-config.json extracts SKILL.md's ## 
Step 
     1 — Fetch GitHub activity section. Not a bug — that section does contain 
the
     substantive/LGTM classification rules (comments.totalCount >= 3 …), so the
     eval reads the right text — but the names disagree at a glance and a future
     reader will trip on it. Consider renaming the heading to ## Step 1 — Fetch 
and
      classify activity, or split into a Step 1 (Fetch) + Step 2 (Classify) so
     directory names match headings.
   
     (Verified clean: eval output-spec keys match expected.json keys exactly 
across
      all three suites — no schema drift like the one I caught in 
     pairing-self-review. The classify-stream's keys are 
injection_attempt_detected
      / lgtm_only_reviews / substantive_reviews / total_inline_comments / 
     total_reviews, consistent across all 5 cases.)
   
     Security
   
     No findings. Strong injection-guard callout in the SKILL.md (2 matches).
     Comprehensive adversarial coverage in the eval suite:
   
     - step-0 case-2-unsafe-handle — path-traversal rejection on the GitHub 
login.
     - step-0 case-4-shell-metachar — shell-metacharacter rejection.
     - step-1 case-5-injection-in-body — embedded SYSTEM: instruction in a 
review
     body must be flagged and not obeyed; classification driven by actual 
content,
     not the inflated payload.
     - step-2 case-2-injection-flagged — AGENT OVERRIDE in a PR title must be
     flagged with no verdict language emitted.
   
     That's a solid four security-adversarial cases for a 12-case suite. 
Read-only
     skill, no GitHub mutations.
   
     Conventions
   
     - [advisory] skill-validate --strict flags one SOFT-category violation on 
this
      skill:
     ▎ .../contributor-activity-sweep/SKILL.md:1: action-inventory in 
description 
     (5 commas) — consider moving the enum to body: 'Output is intentionally 
     limited to GitHub-visible activity — mailing list, docum…'
     ▎ The frontmatter description enumerates off-GitHub channels ("mailing 
list, 
     documentation, user support, mentoring, talks, and release management"). 
Under
      non-strict this is a warning, not a failure. Move that enum to the body 
to 
     keep the matching-layer description tight. (Same finding shape as the 
     pre-existing one on security-tracker-stats-dashboard.)
   
     (Verified clean: SPDX header present; ## Step headings well-formed; eval 
     step-config.json files point at real SKILL.md sections; full eval suite 
ships 
     per the AGENTS.md rule. The unrelated 2nd validator violation is 
     security-tracker-stats-dashboard, pre-existing.)
   
     Summary
   
     Ready to push with one small tightening — no blocking findings. The 
--strict
     description-comma warning is the kind of thing the maintainer review will
     mention anyway; rename consideration on the Step 1 heading is judgment.
     
     Blocking: 0  Advisory: 2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] contributor-activity-sweep skill with eval suite [airflow-steward]

Reply via email to