andreahlert commented on code in PR #215:
URL: https://github.com/apache/airflow-steward/pull/215#discussion_r3295149881
##########
tools/skill-validator/src/skill_validator/__init__.py:
##########
@@ -131,10 +131,30 @@
PRINCIPLE_CATEGORY = "principle_compliance"
TRIGGER_PRESERVATION_CATEGORY = "trigger_preservation"
BODY_INLINE_CATEGORY = "body_inline"
+PRIVACY_CATEGORY = "privacy"
SOFT_CATEGORIES: frozenset[str] = frozenset(
- {PRINCIPLE_CATEGORY, TRIGGER_PRESERVATION_CATEGORY, BODY_INLINE_CATEGORY},
+ {PRINCIPLE_CATEGORY, TRIGGER_PRESERVATION_CATEGORY, BODY_INLINE_CATEGORY,
PRIVACY_CATEGORY},
)
+# ---------------------------------------------------------------------------
+# Privacy-LLM gate-check constants (write-skill/security-checklist.md ยง
Pattern 6)
+# ---------------------------------------------------------------------------
+
+# Skill modes that process external / attacker-controlled content.
+_EXTERNAL_CONTENT_MODES: frozenset[str] = frozenset({"Triage", "Mentoring",
"Drafting"})
+
+# The placeholder that marks a skill as referencing the private security
tracker.
+_TRACKER_PLACEHOLDER = "<tracker>"
+
+# Indicates the skill actually *reads* full issue content from the tracker.
+# Skills that only write to / query metadata from the tracker (e.g. create an
+# issue, list milestones) do not pass private content to the model and are
+# therefore exempt from the Privacy-LLM gate-check.
+_TRACKER_READ_PHRASE = "gh issue view"
Review Comment:
Good, the new compound check via `_TRACKER_ISSUE_API_RE` covers the common
shape. A couple of holes still slip through though:
**1. Leading-slash form bypasses.** `gh` accepts both `gh api repos/...` and
`gh api /repos/...`. The regex `\bgh\s+api\s+repos/<tracker>/issues/[^\s]+`
only matches the slashless form, so:
```bash
gh api /repos/<tracker>/issues/<N>
```
reads the full body and the check returns false. Easy fix:
`\bgh\s+api\s+/?repos/<tracker>/issues/[^\s]+`.
**2. Multi-line mutation produces a false positive.**
`_has_tracker_body_read` iterates `body.splitlines()` and checks
`_TRACKER_ISSUE_API_MUTATION_RE` on the same line as the URL. A standard
backslash-continued PATCH:
```bash
gh api repos/<tracker>/issues/<N> \
-X PATCH \
-f title=x
```
matches the read regex on line 1 and the mutation exclusion never fires, so
a pure write skill gets flagged as a body-read and forced to declare the
privacy gate. Direction is safe (over-gating, not under-gating), but it will
surprise authors. Either join continuation lines before matching or scope the
exclusion to the surrounding fenced block instead of the single line.
**3. `gh api graphql` / `gh issue list --json body` are not covered.** Edge
cases today, but the same threat model (reads issue body via gh) applies. Worth
a TODO at least.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]