justinmclean opened a new pull request, #154:
URL: https://github.com/apache/airflow-steward/pull/154

   The existing Step 2a fuzzy-match runs three structured searches (GHSA IDs, 
code pointers, subject keywords) against existing trackers. These work well 
when a report carries explicit technical identifiers, but miss the most common 
real-world duplicate pattern: the same vulnerability reported twice by 
different people with no shared identifiers, or the same reporter filing again 
weeks later with different framing.
   
   This PR adds two checks that run after the three-key search, triggered only 
when no STRONG (GHSA) match was already found:
   
   Semantic comparison pass — fetches titles and the first 300 characters of 
every open tracker in a single gh issue list call, produces a root-cause 
summary from the incoming report, and compares against the corpus on four axes: 
component/subsystem, bug class, attack path, and fix shape. Two-axis overlap = 
MEDIUM; three or four axes = STRONG (same weight as a GHSA collision — routes 
to security-issue-deduplicate rather than creating a new tracker).
   
   Reporter-identity check — searches open and recently-closed trackers for the 
inbound reporter's email local-part. A hit on a related issue counts as MEDIUM 
even with only one-axis overlap — the primary signal for the 
same-reporter-different-framing case.
   
   The budget guardrail is updated from 5 to 6 gh calls per candidate to 
account for the new bulk-list and reporter-identity calls, plus up to 3 
follow-up full-body reads on the highest-scoring semantic candidates.
   
   Testing — three synthetic test cases verified manually: clear duplicate 
(fires STRONG), false-positive trap with same subsystem but different bug class 
(correctly suppressed), same reporter with different framing (fires STRONG on 
axes; identity check fires as supporting signal). skill-validate passes with no 
violations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to