choo121600 commented on code in PR #146: URL: https://github.com/apache/airflow-steward/pull/146#discussion_r3252263471
########## .claude/skills/issue-reassess-stats/classify.md: ########## @@ -0,0 +1,140 @@ +<!-- SPDX-License-Identifier: Apache-2.0 + https://www.apache.org/licenses/LICENSE-2.0 --> + +# Classify — bucketing the verdicts + +Companion to [`SKILL.md`](SKILL.md). Procedural detail for Step 2: +bucketing each parsed verdict for the aggregation step. + +Classification is pure function of the parsed verdicts produced by +[`fetch.md`](fetch.md) — no network, no writes. Any rule change +here must agree with the producer's labelling logic in +[`issue-reproducer/verification.md`](../issue-reproducer/verification.md). + +## Primary axis: `classification` + +Ten labels per +[`issue-reproducer/verdict-composition.md`](../issue-reproducer/verdict-composition.md): + +| Label | Dashboard bucket | +|---|---| +| `fixed-on-master` | `fixed` (closure candidates) | +| `still-fails-same` | `still-failing` (action candidates) | +| `still-fails-different` | `still-failing` (action candidates) — but flagged: the failure shape differs from the original report | +| `intended-behaviour` | `closed-as-intended` (closure candidates with docs-pointer) | +| `duplicate-of-resolved` | `closed-as-duplicate` (closure candidates with sibling-key) | +| `cannot-run-extraction` | `unrun` (limitations bucket) | +| `cannot-run-environment` | `unrun` | +| `cannot-run-dependency` | `unrun` | +| `timeout` | `unrun` | +| `needs-separate-workspace` | `unrun` | + +The four bucket-level labels (`fixed`, `still-failing`, `closed- +as-intended`, `closed-as-duplicate`, `unrun`) drive the +dashboard's hero cards. + +## Secondary axis: `nature` + +Five labels per +[`issue-reproducer/verdict-composition.md` → *"The nature field"*](../issue-reproducer/verdict-composition.md#the-nature-field): + +| Label | Dashboard bucket | +|---|---| +| `bug-as-advertised` | `real-bug` | +| `bug-as-advertised-partial-fix` | `real-bug-partial` (a sub-bucket; surfaces in the partial-fix panel) | +| `feature-request` | `feature` | +| `feature-request-disguised-as-bug` | `feature-disguised` (surfaces in the tracker-hygiene panel) | +| `intended-and-documented` | `intended` (surfaces in the closure panel with docs-pointer) | + +## Cross-tabulation + +The most informative view: classification × nature. Common cells: + +| Classification × Nature | What it means | +|---|---| +| `still-fails-same` × `bug-as-advertised` | Direct action candidate — real bug, still broken, fix it | +| `still-fails-same` × `feature-request-disguised-as-bug` | Tracker-hygiene candidate — re-type as Improvement, may or may not be implemented | +| `still-fails-same` × `bug-as-advertised-partial-fix` | Partial-fix candidate — some cases pass now, others still fail; finish the fix | +| `fixed-on-master` × `bug-as-advertised` | Standard closure — confirm and close | +| `intended-behaviour` × `intended-and-documented` | Documentation-gap candidate when the reporter mis-read the docs (note in dashboard) | + +The classifier emits the full N × M counts; the +[`aggregate.md`](aggregate.md) step turns these into the +dashboard's headline counts. + +## Multi-case partial-fix detection + +When a verdict's `cases` array has mixed `match_on_master` +values, the classifier flags the verdict as a multi-case partial +fix. The flag is independent of the verdict's `classification` +field — a verdict can be `still-fails-same` overall (because some +cases still fail like the reporter said) AND be a partial fix +(because other cases that used to fail now pass). + +The dashboard's *"partial-fix surfaces"* panel lists these +verdicts with their `cases_summary` line. + +## New-issue candidates from probes + +A verdict's `cross_type_probe.findings` or +`operator_variants_probe.findings` field, when non-empty, indicates +the probe surfaced a bug in a sibling type that the original +report didn't mention. These are new-issue candidates. + +The classifier collects every such finding across the campaign +into the *"new-issue candidates"* list. Aggregation in +[`aggregate.md`](aggregate.md) clusters related findings (e.g., +multiple probes from one family that surface the same sibling- +type bug). + +## Per-component bucketing + +When verdicts carry component labels (extracted from the +description or from a campaign-level metadata file), classify also +buckets by component for the per-component breakdown. + +If no verdicts carry component data, the per-component section is +omitted from the dashboard. + +## Age bucketing + +Compute `age_days` per verdict from the issue's creation date +(extracted from `description.md` when present) and the campaign +run date (`fetched_at` or filesystem mtime as fallback). Buckets: + +| Age | Label | +|---|---| +| < 1 year | `recent` | +| 1–3 years | `mid` | +| 3–10 years | `old` | +| ≥ 10 years | `ancient` | + +The age axis informs the dashboard's *"oldest unresolved"* panel. + Review Comment: > I wonder if should have some setup config to state stage of the project and generate default buckets depnding on that? I’m also +1 on Jarek’s suggestion. It seems like the appropriate bucket duration could vary between projects. Rather than hardcoding it, it might be better to make this configurable through the adopt skill in setup-steward when adopting magpie for a project! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
