choo121600 commented on issue #118:
URL: 
https://github.com/apache/airflow-steward/issues/118#issuecomment-4428619543

   # Follow-up audit: principle adherence after the trim wave
   
   I ran the audit principles from this issue against `main` (HEAD `ee0fa4c`) 
after the trim PRs (#119–#126, plus the earlier #127/#128) landed. Two 
takeaways:
   
   1. **Quantitative gain** — total frontmatter cost across the 20 framework 
skills dropped from ~5,955 → ~5,371 tokens per turn (-584, ≈10%). The largest 
combined `description + when_to_use` is now 1,233 chars (was 1,514), well under 
the 1,536 budget.
   2. **Qualitative — partial principle adherence** — the "keep 
router-essential, push detail to body" rule is well-observed in some PRs but 
residual violations survived in several. Notably the two strictest categories 
(*Distinct-from* and *Long-example*) persisted even in skills that were 
nominally "trimmed."
   
   ## Methodology
   
   - Token counts: `tiktoken cl100k_base` as a proxy for Claude's tokenizer 
(±5–10% on English text).
   - Char counts for the budget check: raw `description + when_to_use` 
block-scalar bodies.
   - Principle audit: each line classified per the issue's criteria — **Keep** 
(trigger phrases, routing cues, high-signal task description) vs 
**Move-to-body** (rationale, implementation notes, long examples, distinct-from 
explanations, criteria-source paths, sub-step descriptions).
   - Trigger-phrase preservation check: eyeball-only for now — proposing 
automation below.
   
   ## Current state (HEAD `ee0fa4c`)
   
   ### Tokens & budget — all under budget
   
   <details>
   <summary>Full table (20 skills, sorted by frontmatter tokens desc)</summary>
   
   | Skill | FM tok | desc+when chars |
   |---|---:|---:|
   | security-issue-triage | 315 | 1103 |
   | security-issue-import-from-pr | 305 | 1036 |
   | security-issue-import | 304 | 1073 |
   | setup-steward | 302 | 1040 |
   | pr-management-code-review | 299 | 944 |
   | pr-management-triage | 287 | 900 |
   | write-skill | 280 | 1036 |
   | security-issue-invalidate | 279 | 1025 |
   | setup-isolated-setup-install | 274 | 962 |
   | setup-isolated-setup-update | 273 | 996 |
   | security-issue-import-from-md | 271 | 975 |
   | setup-isolated-setup-verify | 268 | 1024 |
   | pr-management-mentor | 267 | 972 |
   | security-issue-fix | 260 | 881 |
   | security-cve-allocate | 257 | 813 |
   | setup-override-upstream | 246 | 892 |
   | setup-shared-config-sync | 241 | 831 |
   | security-issue-deduplicate | 240 | 849 |
   | security-issue-sync | 215 | 739 |
   | pr-management-stats | 188 | 604 |
   | **TOTAL** | **5,371** | — |
   
   </details>
   
   Headroom is comfortable — the principle, not the budget, is the binding 
constraint from here on.
   
   ### Principle adherence audit
   
   **CLEAN — 8 skills**: `pr-management-mentor` (#119), `pr-management-stats` 
(baseline-clean), `pr-management-triage` (#125), `security-cve-allocate` (#124, 
one minor `(process step 6)` paren), `security-issue-fix` (#122, one minor 
`process step 9 of README.md` ref), `security-issue-sync`, 
`security-issue-triage` (#120 — exemplary "Skip when … invoke `/X` directly" 
routing block), `setup-isolated-setup-install` (#121).
   
   **MODERATE — 12 skills with at least one residual violation**:
   
   | Skill | Trimmed? | Residual violation | Example |
   |---|---|---|---|
   | `pr-management-code-review` | no | Distinct-from + parenthetical 
criteria-source | "Distinct from `pr-management-triage` (which decides 
*whether* to engage); this skill runs **after** …" |
   | `security-issue-deduplicate` | no | Rationale paren + sub-step + criteria 
source | "(typically discovered independently by two reporters …)" / "Step 2a 
surfaces a STRONG match …" |
   | `security-issue-import` | no | Rationale narrative | "This is the first 
step of the handling process: the entry point that converts an inbound email 
thread into a tracker the rest of the skills … operate on." |
   | `security-issue-import-from-md` | #126 | Distinct-from + long-example | 
"Unlike `security-issue-import` (Gmail) and `security-issue-import-from-pr` 
(public PR), there is no inbound reporter …" / "Typical sources: AI security 
review output, third-party SAST report …" |
   | `security-issue-import-from-pr` | no | Rationale paren + sub-step + chain 
handoff | "(the team-deliberate import implies the security assessment has 
already happened)" / "ready for `security-cve-allocate` to take over." |
   | `security-issue-invalidate` | no | Sub-step inventory + criteria source + 
rationale parens | "apply the `invalid` label, remove the scope label, post a 
short closing comment, archive …" / "(a separate Vulnogram REJECT flow is 
required first)" |
   | `setup-isolated-setup-update` | no | Sub-step enum + rationale | "Reports 
framework-checkout updates (…), pinned-tool upgrade candidates (…), drift 
between …" / "Recommended cadence per the doc: once per Claude Code upgrade or 
once a month …" |
   | `setup-isolated-setup-verify` | no | Criteria-source path + checklist enum 
+ rationale parens | "Walk the verification checklist documented in 
`docs/setup/secure-agent-setup.md`" / "(the "did a denial silently turn into an 
allow?" canary)" |
   | `setup-override-upstream` | #128 | Full sub-step workflow + post-merge 
note | "Lists the adopter's overrides, helps pick one, reads it alongside the 
framework skill it modifies, decides whether the change is generalisable, 
designs the framework-level abstraction, implements it …" |
   | `setup-shared-config-sync` | #127 | git-rebase rationale (the "never X" 
scope statements are borderline) | "Runs `git pull --rebase` first if the local 
checkout is behind, so a push never overwrites concurrent work from another 
machine." |
   | `setup-steward` | #123 | Sub-actions enum duplicates `argument-hint` | 
"Sub-actions: `/setup-steward` — …, `/setup-steward upgrade` — …" vs 
`argument-hint: "[adopt\|upgrade\|verify\|override skill-name\|unadopt]"` |
   | `write-skill` | no | Sub-step enum + criteria-source paths | "Walks the 
user through the framework-specific skill shape — YAML frontmatter (…), bundled 
resources (…), placeholder convention (…), … and the Privacy-LLM gate-check 
boilerplate." |
   
   ## Five reusable violation patterns
   
   The residuals cluster into 5 patterns. Codifying these in `write-skill` 
would make new skills land principle-compliant by construction:
   
   1. **Action-inventory in description** — "Does A, B, C, D, E, F, G, H." 
Reduce to verb + outcome; move the enum to body unless it *is* the high-signal 
contract.
   2. **Distinct-from-sibling-skill** — "Unlike X / Distinct from Y / 
Counterpart to Z." The router needs the *trigger* and *skip-when for the wrong 
sibling*, not the comparison. `security-issue-triage` after #120 shows the 
right shape: `Skip when team consensus … has already landed — invoke /X 
(VALID), /Y (INFO-ONLY) directly` — a routing redirect, not a comparison.
   3. **Chain-handoff narrative** — "Finishes by handing off to X" / "ready for 
Y to take over." Body content; the chain doesn't change which skill the router 
picks.
   4. **Parenthetical rationale** — "(typically …)", "(a separate REJECT flow 
is required first)", "(the team-deliberate import implies …)". The router needs 
*whether*, not *why*.
   5. **Criteria-source path** — "per process step 6", 
"`docs/setup/secure-agent-setup.md`", "Step 2a". The router doesn't open the 
doc.
   
   ## Proposed follow-up
   
   With this issue closed, treating this as a retrospective — happy to file a 
fresh issue if that's the cleaner home.
   
   - **Codify the 5 patterns in `write-skill`** — its own frontmatter currently 
violates patterns 1 and 5 (the 
YAML/resources/placeholder/preamble/injection/Privacy-LLM enum and the 
validator path), so it's a natural first PR: dogfood the rule and ship the doc 
together.
   - **Add a non-regression check for trigger phrases** — extract quoted 
strings from `when_to_use` before/after a trim PR; the set must be 
equal-or-superset of `main`. Could live as a `tools/skill-validator` SOFT 
warning (matching the existing progressive-disclosure precedent).
   - **Add SOFT warnings for the 5 patterns** in `tools/skill-validator` — 
surface them in CI as advisory, not blocking; reviewer keeps the final say on 
borderline cases (scope statements, action enums that are the contract).
   - **Trim the remaining 12** — one small PR per skill. Five of those are 
previously-trimmed skills with residual violations, so they're useful 
calibration data for the codified guidance.
   
   Audit scripts (`measure_frontmatter.py`, `budget_check.py`, frontmatter 
dumper) are local; happy to PR them under `tools/skill-validator/` if useful as 
ongoing CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to