Re: [PR] issue-* skill family proposal (prototyped against Groovy's existing skills) [airflow-steward]

via GitHub Sat, 16 May 2026 04:22:38 -0700


paulk-asert commented on code in PR #146:
URL: https://github.com/apache/airflow-steward/pull/146#discussion_r3252726438



##########
.claude/skills/issue-reassess-stats/classify.md:
##########
@@ -0,0 +1,140 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Classify — bucketing the verdicts
+
+Companion to [`SKILL.md`](SKILL.md). Procedural detail for Step 2:
+bucketing each parsed verdict for the aggregation step.
+
+Classification is pure function of the parsed verdicts produced by
+[`fetch.md`](fetch.md) — no network, no writes. Any rule change
+here must agree with the producer's labelling logic in
+[`issue-reproducer/verification.md`](../issue-reproducer/verification.md).
+
+## Primary axis: `classification`
+
+Ten labels per
+[`issue-reproducer/verdict-composition.md`](../issue-reproducer/verdict-composition.md):
+
+| Label | Dashboard bucket |
+|---|---|
+| `fixed-on-master` | `fixed` (closure candidates) |
+| `still-fails-same` | `still-failing` (action candidates) |
+| `still-fails-different` | `still-failing` (action candidates) — but flagged: 
the failure shape differs from the original report |
+| `intended-behaviour` | `closed-as-intended` (closure candidates with 
docs-pointer) |
+| `duplicate-of-resolved` | `closed-as-duplicate` (closure candidates with 
sibling-key) |
+| `cannot-run-extraction` | `unrun` (limitations bucket) |
+| `cannot-run-environment` | `unrun` |
+| `cannot-run-dependency` | `unrun` |
+| `timeout` | `unrun` |
+| `needs-separate-workspace` | `unrun` |
+
+The four bucket-level labels (`fixed`, `still-failing`, `closed-
+as-intended`, `closed-as-duplicate`, `unrun`) drive the
+dashboard's hero cards.
+
+## Secondary axis: `nature`
+
+Five labels per
+[`issue-reproducer/verdict-composition.md` → *"The nature 
field"*](../issue-reproducer/verdict-composition.md#the-nature-field):
+
+| Label | Dashboard bucket |
+|---|---|
+| `bug-as-advertised` | `real-bug` |
+| `bug-as-advertised-partial-fix` | `real-bug-partial` (a sub-bucket; surfaces 
in the partial-fix panel) |
+| `feature-request` | `feature` |
+| `feature-request-disguised-as-bug` | `feature-disguised` (surfaces in the 
tracker-hygiene panel) |
+| `intended-and-documented` | `intended` (surfaces in the closure panel with 
docs-pointer) |
+
+## Cross-tabulation
+
+The most informative view: classification × nature. Common cells:
+
+| Classification × Nature | What it means |
+|---|---|
+| `still-fails-same` × `bug-as-advertised` | Direct action candidate — real 
bug, still broken, fix it |
+| `still-fails-same` × `feature-request-disguised-as-bug` | Tracker-hygiene 
candidate — re-type as Improvement, may or may not be implemented |
+| `still-fails-same` × `bug-as-advertised-partial-fix` | Partial-fix candidate 
— some cases pass now, others still fail; finish the fix |
+| `fixed-on-master` × `bug-as-advertised` | Standard closure — confirm and 
close |
+| `intended-behaviour` × `intended-and-documented` | Documentation-gap 
candidate when the reporter mis-read the docs (note in dashboard) |
+
+The classifier emits the full N × M counts; the
+[`aggregate.md`](aggregate.md) step turns these into the
+dashboard's headline counts.
+
+## Multi-case partial-fix detection
+
+When a verdict's `cases` array has mixed `match_on_master`
+values, the classifier flags the verdict as a multi-case partial
+fix. The flag is independent of the verdict's `classification`
+field — a verdict can be `still-fails-same` overall (because some
+cases still fail like the reporter said) AND be a partial fix
+(because other cases that used to fail now pass).
+
+The dashboard's *"partial-fix surfaces"* panel lists these
+verdicts with their `cases_summary` line.
+
+## New-issue candidates from probes
+
+A verdict's `cross_type_probe.findings` or
+`operator_variants_probe.findings` field, when non-empty, indicates
+the probe surfaced a bug in a sibling type that the original
+report didn't mention. These are new-issue candidates.
+
+The classifier collects every such finding across the campaign
+into the *"new-issue candidates"* list. Aggregation in
+[`aggregate.md`](aggregate.md) clusters related findings (e.g.,
+multiple probes from one family that surface the same sibling-
+type bug).
+
+## Per-component bucketing
+
+When verdicts carry component labels (extracted from the
+description or from a campaign-level metadata file), classify also
+buckets by component for the per-component breakdown.
+
+If no verdicts carry component data, the per-component section is
+omitted from the dashboard.
+
+## Age bucketing
+
+Compute `age_days` per verdict from the issue's creation date
+(extracted from `description.md` when present) and the campaign
+run date (`fetched_at` or filesystem mtime as fallback). Buckets:
+
+| Age | Label |
+|---|---|
+| < 1 year | `recent` |
+| 1–3 years | `mid` |
+| 3–10 years | `old` |
+| ≥ 10 years | `ancient` |
+
+The age axis informs the dashboard's *"oldest unresolved"* panel.
+

Review Comment:
   The skill now documents that the age bands are defaults, not a contract, 
states why "recent" is project-relative, and routes adopters to retune the band 
edges via the existing .apache-steward-overrides/issue-reassess-stats.md 
mechanism (mirroring how health-rating thresholds are already overridable).
   I think some generated preset defaults bands might also be nice as a 
follow-up; this PR just makes the capability discoverable in the meantime. But 
if you think that is a blocker let me know.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] issue-* skill family proposal (prototyped against Groovy's existing skills) [airflow-steward]

Reply via email to