andreahlert commented on code in PR #215:
URL: https://github.com/apache/airflow-steward/pull/215#discussion_r3295149881


##########
tools/skill-validator/src/skill_validator/__init__.py:
##########
@@ -131,10 +131,30 @@
 PRINCIPLE_CATEGORY = "principle_compliance"
 TRIGGER_PRESERVATION_CATEGORY = "trigger_preservation"
 BODY_INLINE_CATEGORY = "body_inline"
+PRIVACY_CATEGORY = "privacy"
 SOFT_CATEGORIES: frozenset[str] = frozenset(
-    {PRINCIPLE_CATEGORY, TRIGGER_PRESERVATION_CATEGORY, BODY_INLINE_CATEGORY},
+    {PRINCIPLE_CATEGORY, TRIGGER_PRESERVATION_CATEGORY, BODY_INLINE_CATEGORY, 
PRIVACY_CATEGORY},
 )
 
+# ---------------------------------------------------------------------------
+# Privacy-LLM gate-check constants (write-skill/security-checklist.md ยง 
Pattern 6)
+# ---------------------------------------------------------------------------
+
+# Skill modes that process external / attacker-controlled content.
+_EXTERNAL_CONTENT_MODES: frozenset[str] = frozenset({"Triage", "Mentoring", 
"Drafting"})
+
+# The placeholder that marks a skill as referencing the private security 
tracker.
+_TRACKER_PLACEHOLDER = "<tracker>"
+
+# Indicates the skill actually *reads* full issue content from the tracker.
+# Skills that only write to / query metadata from the tracker (e.g. create an
+# issue, list milestones) do not pass private content to the model and are
+# therefore exempt from the Privacy-LLM gate-check.
+_TRACKER_READ_PHRASE = "gh issue view"

Review Comment:
   Good, the new compound check via `_TRACKER_ISSUE_API_RE` covers the common 
shape. A couple of holes still slip through though:
   
   **1. Leading-slash form bypasses.** `gh` accepts both `gh api repos/...` and 
`gh api /repos/...`. The regex `\bgh\s+api\s+repos/<tracker>/issues/[^\s]+` 
only matches the slashless form, so:
   
   ```bash
   gh api /repos/<tracker>/issues/<N>
   ```
   
   reads the full body and the check returns false. Easy fix: 
`\bgh\s+api\s+/?repos/<tracker>/issues/[^\s]+`.
   
   **2. Multi-line mutation produces a false positive.** 
`_has_tracker_body_read` iterates `body.splitlines()` and checks 
`_TRACKER_ISSUE_API_MUTATION_RE` on the same line as the URL. A standard 
backslash-continued PATCH:
   
   ```bash
   gh api repos/<tracker>/issues/<N> \
     -X PATCH \
     -f title=x
   ```
   
   matches the read regex on line 1 and the mutation exclusion never fires, so 
a pure write skill gets flagged as a body-read and forced to declare the 
privacy gate. Direction is safe (over-gating, not under-gating), but it will 
surprise authors. Either join continuation lines before matching or scope the 
exclusion to the surrounding fenced block instead of the single line.
   
   **3. `gh api graphql` / `gh issue list --json body` are not covered.** Edge 
cases today, but the same threat model (reads issue body via gh) applies. Worth 
a TODO at least.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to