(airflow-steward) branch main updated: Add security-tracker-stats-dashboard tool + skill (#248)

potiuk Fri, 22 May 2026 11:04:23 -0700

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git



The following commit(s) were added to refs/heads/main by this push:
     new f2104ed  Add security-tracker-stats-dashboard tool + skill (#248)
f2104ed is described below

commit f2104ed463f1b22a3547066543738823f9a9bcd8
Author: Jarek Potiuk <[email protected]>
AuthorDate: Fri May 22 20:03:29 2026 +0200

    Add security-tracker-stats-dashboard tool + skill (#248)
    
    Generalised from the airflow-s reference dashboard, this adds a
    read-only stats dashboard for any apache-steward adopter. Output is a
    self-contained HTML page with Plotly charts: lifecycle bands (per
    adopter-configurable categories), opened-vs-untriaged backlog,
    cumulative opened/closed, mean time to triage, mean time to first
    response, and PR-driven mean times (createdAt -> PR-opened, PR-open
    -> PR-merged, PR-merged -> advisory announced).
    
    Configuration lives in a YAML overlay the adopter places at
    `.apache-steward-overrides/security-tracker-stats.yaml` (path configurable 
in
    `<project-config>/security-tracker-stats.md`). Defaults reproduce the
    airflow-s reference implementation byte-for-byte; everything that
    was hardcoded there (bucket granularity, milestones, scope labels,
    category predicates, triage keywords, bot prefixes, upstream repo)
    is now an overrideable knob.
    
    The skill at `.claude/skills/security-tracker-stats-dashboard/SKILL.md`
    follows the framework's standard structural template (placeholder
    convention header, adopter-override section, snapshot-drift section,
    prerequisites, inputs, how-to-invoke, golden rules, failure modes).
    
    The renderer prefers PyYAML when available and falls back to a tiny
    bundled YAML subset parser when it is not, so adopters without a
    build step still get the dashboard.
---
 .../security-tracker-stats-dashboard/SKILL.md      |  267 +++++
 projects/_template/security-tracker-stats.md       |  139 +++
 tools/security-tracker-stats-dashboard/README.md   |  181 ++++
 .../default-config.yaml                            |  135 +++
 .../fetch_bodies.py                                |   56 +
 .../fetch_events.py                                |   61 ++
 .../fetch_issues.py                                |   26 +
 .../security-tracker-stats-dashboard/fetch_prs.py  |  103 ++
 .../fetch_roster.py                                |   24 +
 tools/security-tracker-stats-dashboard/render.py   | 1121 ++++++++++++++++++++
 tools/security-tracker-stats-dashboard/run.sh      |   58 +
 11 files changed, 2171 insertions(+)

diff --git a/.claude/skills/security-tracker-stats-dashboard/SKILL.md 
b/.claude/skills/security-tracker-stats-dashboard/SKILL.md
new file mode 100644
index 0000000..d940c0b
--- /dev/null
+++ b/.claude/skills/security-tracker-stats-dashboard/SKILL.md
@@ -0,0 +1,267 @@
+---
+name: security-tracker-stats-dashboard
+description: |
+  Generate a self-contained HTML dashboard of `<tracker>` repository
+  statistics: issue-lifecycle bands (untriaged / triaged / PR-merged /
+  fixed-released / closed-other), opened-vs-untriaged backlog,
+  cumulative opened/closed, mean time to triage, mean time to first
+  response, and — when `<upstream>` is configured — mean time
+  createdAt -> PR-opened, PR-open -> PR-merged, and PR-merged ->
+  advisory announced. All charts are line / area (no bars) with
+  `connectgaps: true`. Vertical annotations on every chart mark the
+  milestones declared in the project's overlay (e.g. "skill
+  adoption", "team handover", "process change").
+when_to_use: |
+  Invoke when the user says "regenerate the tracker dashboard", "show
+  monthly/quarterly stats", "tracker stats", "dashboard", or
+  variations. Also when an existing dashboard at the configured output
+  path is stale (older than ~24 h) and the user is reviewing tracker
+  health. Read-only — the skill never modifies any tracker state.
+license: Apache-2.0
+---
+
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!-- Placeholder convention (see 
AGENTS.md#placeholder-convention-used-in-skill-files):
+     <project-config> -> adopting project's `.apache-steward/` directory
+     <framework>      -> framework root (the `.apache-steward/`
+                         snapshot in an adopter repo, or `.` in the
+                         framework standalone checkout)
+     <tracker>        -> value of `tracker_repo:` in 
<project-config>/project.md
+                         (example: airflow-s/airflow-s)
+     <upstream>       -> value of `upstream_repo:` in 
<project-config>/project.md
+                         (example: apache/airflow); may be null for
+                         trackers whose fixes do not land in a
+                         single upstream codebase.
+     Before running any bash command below, substitute these with the
+     concrete values from the adopting project's <project-config>/project.md. 
-->
+
+# security-tracker-stats-dashboard
+
+Read-only skill that renders a self-contained HTML page summarising
+the state of `<tracker>` over time. The skill wraps the
+[`tools/security-tracker-stats-dashboard/`](../../../tools/security-tracker-stats-dashboard/README.md)
+runtime tool — both the slash-command path (this skill) and the
+script path (`run.sh`) run the same fetch + render pipeline; the
+skill adds invocation niceties (resolving cache paths, surfacing the
+output URL, proposing a stale-cache refresh) but never mutates
+anything.
+
+The skill is **read-only on GitHub** — it does not create or modify
+issues, comments, labels, or PRs. It only fetches data via `gh` and
+renders an HTML file.
+
+---
+
+## Adopter overrides
+
+Before running the default behaviour documented
+below, this skill consults
+[`.apache-steward-overrides/security-tracker-stats-dashboard.md`](../../../docs/setup/agentic-overrides.md)
+in the adopter repo if it exists, and applies any
+agent-readable overrides it finds. See
+[`docs/setup/agentic-overrides.md`](../../../docs/setup/agentic-overrides.md)
+for the contract — what overrides may contain, hard
+rules, the reconciliation flow on framework upgrade,
+upstreaming guidance.
+
+Configuration for the *renderer* (bucket granularity, milestones,
+categories, scope labels, triage keywords, …) lives in a separate
+YAML file the adopter places at
+`.apache-steward-overrides/security-tracker-stats.yaml` (path is
+adopter-configurable via `tracker_stats_config:` in
+[`<project-config>/security-tracker-stats.md`](../../../projects/_template/security-tracker-stats.md)).
+The agentic override file above is reserved for *behavioural*
+overrides of this skill (when to propose a refresh, where to write
+the HTML, etc.); renderer knobs go in the YAML config.
+
+**Hard rule**: agents NEVER modify the snapshot under
+`<adopter-repo>/.apache-steward/`. Local modifications
+go in the override file. Framework changes go via PR
+to `apache/airflow-steward`.
+
+---
+
+## Snapshot drift
+
+Also at the top of every run, this skill compares the
+gitignored `.apache-steward.local.lock` (per-machine
+fetch) against the committed `.apache-steward.lock`
+(the project pin). On mismatch the skill surfaces the
+gap and proposes
+[`/setup-steward upgrade`](../setup-steward/upgrade.md).
+The proposal is non-blocking — the user may defer if
+they want to run with the local snapshot for now. See
+[`docs/setup/install-recipes.md` § Subsequent runs and drift 
detection](../../../docs/setup/install-recipes.md#subsequent-runs-and-drift-detection)
+for the full flow.
+
+Drift severity:
+
+- **method or URL differ** -> ✗ full re-install needed.
+- **ref differs** (project bumped tag, or `git-branch`
+  local is behind upstream tip) -> ⚠ sync needed.
+- **`svn-zip` SHA-512 mismatches the committed
+  anchor** -> ✗ security-flagged; investigate before
+  upgrading.
+
+---
+
+## Prerequisites
+
+- `gh` authenticated with read access to `<tracker>` (and to
+  `<upstream>` for PR metadata, when configured).
+- `python3` (3.9+).
+- `jq` (used by `fetch_events.py` via gh's `--jq` flag).
+- Network access to `api.github.com` and (for *viewing* the output
+  HTML) Plotly's CDN.
+- Optional: PyYAML. When missing, the renderer falls back to a
+  bundled minimal YAML subset parser sufficient for
+  `default-config.yaml` and typical overlays.
+
+---
+
+## Inputs
+
+The skill accepts up to three optional arguments:
+
+| Selector | Meaning |
+|---|---|
+| *(no args)* | render with all defaults — monthly buckets, default 
categories, the adopter's milestones |
+| `quarterly` / `monthly` | override the bucket granularity |
+| `<output-path>` | write the HTML to a specific path |
+| `clear-cache` | delete the fetch cache before fetching |
+| `since:YYYY-MM` / `since:YYYY-Qn` | override the start bucket |
+
+If the adopter passes nothing, surface the resolved output path and
+cache state up front so they can interrupt before a 5-10 minute
+fetch.
+
+---
+
+## How to invoke
+
+1. **Resolve config.** Read
+   
[`<project-config>/security-tracker-stats.md`](../../../projects/_template/security-tracker-stats.md)
+   for the project's per-renderer YAML config path (default:
+   `<adopter-repo>/.apache-steward-overrides/security-tracker-stats.yaml`).
+   Surface to the user *which* config file will be applied and
+   *what bucket granularity* it resolves to. If the YAML file does
+   not exist, fall back silently to the framework's
+   `default-config.yaml`.
+
+2. **Check cache freshness.** Inspect
+   `${TRACKER_STATS_CACHE:-/tmp/tracker-stats-cache}/issues.json`
+   mtime. If older than 24 h, propose a fresh fetch; if missing or
+   the user passed `clear-cache`, do a fresh fetch unconditionally.
+
+3. **Run the orchestrator.** Substitute placeholders and invoke:
+
+   ```bash
+   TRACKER_STATS_REPO=<tracker> \
+   TRACKER_STATS_UPSTREAM_REPO=<upstream> \
+   
TRACKER_STATS_CONFIG=<adopter-repo>/.apache-steward-overrides/security-tracker-stats.yaml
 \
+   bash <framework>/tools/security-tracker-stats-dashboard/run.sh <output-path>
+   ```
+
+   When the user passed `monthly` / `quarterly` or
+   `since:<start>`, prepend the matching `TRACKER_STATS_BUCKETS=` /
+   `TRACKER_STATS_START=` env vars.
+
+4. **Report the result.** Print the final HTML path and a short
+   summary (total trackers, open count, latest-bucket category
+   breakdown, triage-median, PR-merge-median when configured). The
+   pipeline already echoes most of this to stdout — pass it
+   through verbatim and add the clickable
+   `file://<output-path>` line at the end.
+
+The full pipeline:
+
+1. `fetch_issues.py` — `gh issue list --state all --limit 1000` ->
+   `<cache>/issues.json`.
+2. `fetch_roster.py` — `gh api repos/<tracker>/collaborators` ->
+   `<cache>/roster.txt`.
+3. `fetch_bodies.py` — per-issue `body` +
+   `closedByPullRequestsReferences` -> `<cache>/issue_extra.json`.
+4. `fetch_events.py` — per-issue label-history events ->
+   `<cache>/events/<N>.json`.
+5. `fetch_prs.py` — per-PR `createdAt` / `mergedAt` / `state` from
+   `<upstream>` -> `<cache>/prs.json`. Silent no-op when
+   `TRACKER_STATS_UPSTREAM_REPO` is empty or `none`.
+6. `render.py` — reads cache + config, writes HTML to
+   `$TRACKER_STATS_OUT`.
+
+Each fetch script resumes from cache, so re-running after a partial
+failure (rate limit, transient HTTP error) only re-fetches what is
+missing.
+
+---
+
+## Configuration overview
+
+See
+[`tools/security-tracker-stats-dashboard/default-config.yaml`](../../../tools/security-tracker-stats-dashboard/default-config.yaml)
+for the schema with inline documentation, and
+[`tools/security-tracker-stats-dashboard/README.md`](../../../tools/security-tracker-stats-dashboard/README.md)
+for the load order, predicate keys, and snapshot replay semantics.
+
+The most-overridden knobs by adopters tend to be:
+
+- **`buckets:`** — monthly vs. quarterly. Smaller tracker repos
+  (<50 issues / year) read better at quarterly granularity.
+- **`milestones:`** — vertical annotations marking process
+  changes the dashboard should highlight (skill adoption, team
+  handover, policy update). Set to `[]` to remove them.
+- **`scope_labels:`** — the project's primary "what does this
+  affect" axis. Defaults to `[airflow, providers, chart]`;
+  adopters use whatever scope-label set
+  
[`<project-config>/scope-labels.md`](../../../projects/_template/scope-labels.md)
+  declares.
+- **`categories:`** — the lifecycle-band classification rules.
+  Defaults match the airflow-s reference implementation
+  byte-for-byte; adopters with different label conventions
+  (e.g. `triaged` instead of *no `needs triage`*) re-state the
+  whole list.
+- **`triage.keywords:`** / **`triage.bot_prefixes:`** — the
+  time-to-triage signal. Adopters whose security team uses
+  different phrasing in triage-proposal comments override these.
+
+---
+
+## Hard rules
+
+**Golden rule 1 — read only, never write.** The skill must not
+post comments, add labels, close, edit, or otherwise mutate any
+tracker, PR, or upstream resource. If the user asks for stats and
+also wants an action, decline the mutation.
+
+**Golden rule 2 — proposal-before-fetch on stale cache.** Before
+running a fresh full fetch (which costs ~5-10 minutes of `gh` API
+calls), surface the proposal and wait for explicit user
+confirmation. Incremental re-renders against a warm cache (~30
+seconds) can run without a prompt.
+
+**Golden rule 3 — never edit the snapshot.** As with every other
+skill, agentic overrides go in
+`.apache-steward-overrides/security-tracker-stats-dashboard.md`; renderer
+overrides go in the project's tracker-stats YAML config file. The
+gitignored snapshot under `.apache-steward/` is never modified.
+
+**Golden rule 4 — surface the config path on every run.** The
+dashboard's output depends entirely on which YAML file the renderer
+loaded. Print the resolved config path (or "default") as the first
+line of skill output so the user can tell at a glance whether their
+overlay is being picked up.
+
+---
+
+## Failure modes
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| `events/<N>.json` missing for some N | gh transient failure during paginate 
| Re-run; `fetch_events.py` resumes from cache |
+| `prs.json` has `{"error": ...}` entries | False-positive body parse (PR# 
doesn't exist) | Silently filtered at render; safe to ignore |
+| `c_rel` median jumps after re-fetch | New advisory shipped since last run | 
Expected — re-render is correct |
+| Empty `c_prc` / `c_prm` / `c_rel` early buckets | No linked PR in those 
tracker buckets | Expected — not all early trackers had a fix PR |
+| Three PR charts missing entirely | `upstream_repo: null` in config (or env 
override) | By design — set `upstream_repo:` if you want them |
+| `ModuleNotFoundError: yaml` | PyYAML missing | Bundled fallback parser 
handles `default-config.yaml`; install pyyaml for richer overlays |
diff --git a/projects/_template/security-tracker-stats.md 
b/projects/_template/security-tracker-stats.md
new file mode 100644
index 0000000..ab7619a
--- /dev/null
+++ b/projects/_template/security-tracker-stats.md
@@ -0,0 +1,139 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update 
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with 
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [security-tracker-stats.md (template)](#security-tracker-statsmd-template)
+  - [YAML config path](#yaml-config-path)
+  - [Default output path](#default-output-path)
+  - [Cache directory](#cache-directory)
+  - [Refresh cadence](#refresh-cadence)
+  - [Example overlay 
(`security-tracker-stats.yaml`)](#example-overlay-security-tracker-statsyaml)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# security-tracker-stats.md (template)
+
+Per-project configuration consumed by the
+[`security-tracker-stats-dashboard`](../../.claude/skills/security-tracker-stats-dashboard/SKILL.md)
+skill. Copy this file into your project's `<project-config>/`
+directory and edit the values below. Everything is optional — the
+skill falls back to
+[`tools/security-tracker-stats-dashboard/default-config.yaml`](../../tools/security-tracker-stats-dashboard/default-config.yaml)
+when a key is unset.
+
+## YAML config path
+
+```yaml
+tracker_stats_config: .apache-steward-overrides/security-tracker-stats.yaml
+```
+
+The renderer reads its configuration from the YAML file pointed at by
+the `TRACKER_STATS_CONFIG` env var. The skill resolves this from
+`tracker_stats_config:` above (interpreting it relative to the
+adopter repo root). Adopters who want the framework's defaults
+verbatim can leave this unset; the skill will skip the overlay step.
+
+The YAML schema is documented inline at
+[`tools/security-tracker-stats-dashboard/default-config.yaml`](../../tools/security-tracker-stats-dashboard/default-config.yaml).
+
+## Default output path
+
+```yaml
+tracker_stats_output: tmp/tracker_stats.html
+```
+
+The skill writes the rendered HTML to this path (relative to the
+adopter repo root, or absolute) when the user does not pass an
+explicit `<output-path>` argument. The
+`airflow-s/airflow-s` adopter uses `tmp/airflow_s_monthly.html`
+(committed into `tmp/` as the canonical artefact for security-team
+review).
+
+## Cache directory
+
+```yaml
+tracker_stats_cache: /tmp/tracker-stats-cache
+```
+
+Where the fetch scripts persist their cache. Safe to delete (forces a
+full re-fetch). The skill resolves this to the `TRACKER_STATS_CACHE`
+env var.
+
+## Refresh cadence
+
+```yaml
+tracker_stats_refresh_hours: 24
+```
+
+The skill considers the cache stale when `issues.json` is older than
+this many hours, and proposes a refresh before re-rendering. Lower
+this for fast-moving trackers; raise it for trackers where the
+dashboard is reviewed weekly or monthly.
+
+## Example overlay (`security-tracker-stats.yaml`)
+
+A minimal overlay that swaps to quarterly buckets and adds a
+project-specific milestone:
+
+```yaml
+buckets: quarterly
+
+milestones:
+  - date: 2026-04-20
+    label: skill adoption
+  - date: 2026-09-01
+    label: handover to PMC sec team
+```
+
+A bigger overlay that renames the scope labels for a non-Airflow
+adopter and removes the upstream-PR charts entirely (because fixes
+land in many repos, not a single `<upstream>`):
+
+```yaml
+upstream_repo: null
+
+scope_labels: [core, plugins, docs]
+
+milestones: []
+
+# Re-state the full categories list to align with the project's
+# label conventions. The framework's default categories assume
+# `needs triage`, `pr merged`, `fix released`, `announced - emails
+# sent`, `cve allocated` — projects with different label vocabularies
+# need to re-state predicates explicitly.
+categories:
+  - name: fixed_released
+    color: "#2ca02c"
+    predicate:
+      any_of:
+        - any_label: [released]
+        - all_of:
+            state: closed
+            state_reason: COMPLETED
+            any_label: [security-fix]
+  - name: closed_other
+    color: "#888888"
+    predicate:
+      state: closed
+  - name: open_untriaged
+    color: "#d62728"
+    predicate:
+      all_of:
+        state: open
+        any_of:
+          - any_label: [needs triage]
+          - no_scope_label: true
+  - name: open_pr_merged
+    color: "#e67e22"
+    predicate:
+      all_of:
+        state: open
+        any_label: [pr merged]
+  - name: open_triaged
+    color: "#f1c40f"
+    predicate:
+      state: open
+```
diff --git a/tools/security-tracker-stats-dashboard/README.md 
b/tools/security-tracker-stats-dashboard/README.md
new file mode 100644
index 0000000..06dcced
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/README.md
@@ -0,0 +1,181 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update 
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with 
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [security-tracker-stats-dashboard](#security-tracker-stats-dashboard)
+  - [Layout](#layout)
+  - [Invocation](#invocation)
+    - [Resume behaviour](#resume-behaviour)
+  - [Configuration](#configuration)
+    - [Categories (lifecycle bands)](#categories-lifecycle-bands)
+    - [Time-to-triage signal](#time-to-triage-signal)
+    - [Milestones (vertical annotations)](#milestones-vertical-annotations)
+    - [When `upstream_repo` is null](#when-upstream_repo-is-null)
+  - [Prerequisites](#prerequisites)
+  - [Failure modes](#failure-modes)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# security-tracker-stats-dashboard
+
+Generate a self-contained HTML dashboard of `<tracker>` repository
+statistics — issue-lifecycle bands (untriaged / triaged / PR-merged /
+fixed-released / closed-other), opened-vs-untriaged backlog, cumulative
+opened/closed, and mean-time-to-triage / first-response / PR-open /
+PR-merge / advisory-announced.
+
+All charts are line / area (no bars) with `connectgaps: true`. Plotly
+loaded via CDN — the output HTML is self-contained but viewing it
+requires network access for the chart library.
+
+The tool is **read-only on GitHub** — it does not create or modify
+issues, comments, labels, or PRs. It only fetches data via `gh` and
+renders an HTML file.
+
+The companion agentic skill at
+[`.claude/skills/security-tracker-stats-dashboard/SKILL.md`](../../.claude/skills/security-tracker-stats-dashboard/SKILL.md)
+wraps this tool and surfaces it through Claude Code's slash-command
+interface; both routes (script-only and skill-driven) run the same
+fetch + render pipeline.
+
+## Layout
+
+```text
+tools/security-tracker-stats-dashboard/
+├── README.md             (this file)
+├── default-config.yaml   (config schema + adopter-overridable defaults)
+├── render.py             (renders cached data to HTML; reads config)
+├── fetch_issues.py       (gh issue list -> issues.json)
+├── fetch_roster.py       (gh api collaborators -> roster.txt)
+├── fetch_bodies.py       (per-issue body + closedByPRs -> issue_extra.json)
+├── fetch_events.py       (per-issue label history -> events/<N>.json)
+├── fetch_prs.py          (per-PR metadata from <upstream> -> prs.json)
+└── run.sh                (orchestrator)
+```
+
+## Invocation
+
+```bash
+bash <framework>/tools/security-tracker-stats-dashboard/run.sh [<output-path>]
+```
+
+Env knobs (all optional):
+
+| Var | Default | Notes |
+|---|---|---|
+| `TRACKER_STATS_REPO` | *(e.g. `airflow-s/airflow-s`)* | `<tracker>` repo 
slug |
+| `TRACKER_STATS_OUT` | `/tmp/airflow_s_monthly.html` | output HTML path |
+| `TRACKER_STATS_CACHE` | `/tmp/tracker-stats-cache` | fetch cache dir |
+| `TRACKER_STATS_CONFIG` | *(unset)* | path to a YAML overlay file |
+| `TRACKER_STATS_BUCKETS` | *(from config: `monthly`)* | `monthly` or 
`quarterly` |
+| `TRACKER_STATS_START` | *(from config: `null`)* | `YYYY-MM` or `YYYY-Qn` |
+| `TRACKER_STATS_UPSTREAM_REPO` | *(from config; e.g. `apache/airflow`)* | 
`<upstream>` repo slug; `none` skips PR charts |
+
+### Resume behaviour
+
+Each fetch script resumes from cache, so re-running after a partial
+failure (rate limit, transient HTTP error) only re-fetches what is
+missing. Delete the cache dir to force a fresh full fetch.
+
+Fetches are parallelised (`ThreadPoolExecutor`, ~10 workers). A fresh
+run is ~5–10 minutes on a 250-issue tracker; incremental re-renders
+(cache warm) are ~30 seconds.
+
+## Configuration
+
+`render.py` loads configuration in this order, highest priority last:
+
+1. `default-config.yaml` (in this directory).
+2. `$TRACKER_STATS_CONFIG` overlay YAML, when set (typically
+   `<adopter-repo>/.apache-steward-overrides/security-tracker-stats.yaml`).
+   Deep-merged with the default. **The `milestones` and `categories`
+   lists are REPLACED entirely** (not concatenated) — overlaying a
+   single category requires re-stating the whole list.
+3. Env-var quick overrides for the most common knobs:
+   `TRACKER_STATS_BUCKETS`, `TRACKER_STATS_START`,
+   `TRACKER_STATS_UPSTREAM_REPO`.
+
+See [`default-config.yaml`](default-config.yaml) for the full schema
+with inline documentation of every predicate key.
+
+### Categories (lifecycle bands)
+
+Mutually-exclusive states per tracker at each bucket-end snapshot,
+evaluated **top-to-bottom** with first-match-wins. Multiple rules can
+share a `name` to express disjoint branches of the same final
+category — the default set uses this for the `open / closed`
+fork on `fixed_released`. The set of distinct names defines the
+stack order in the lifecycle chart (overridable via the
+`stack_order:` config key).
+
+Supported predicate keys:
+
+| Key | Meaning |
+|---|---|
+| `state` | `open` / `closed` |
+| `state_reason` | `COMPLETED` / `NOT_PLANNED` / `REOPENED` / `null` |
+| `any_label` | at least one of the listed labels is present |
+| `all_labels` | every label in the list is present |
+| `not_label` | the named label must NOT be present |
+| `not_any_label` | none of the listed labels present |
+| `no_scope_label` (`true`/`false`) | tracker carries none of `scope_labels` |
+| `has_scope_label` (`true`/`false`) | tracker carries at least one of 
`scope_labels` |
+| `pr_merged_by_snapshot` (`true`/`false`) | a linked `<upstream>` PR is 
merged by the snapshot timestamp |
+| `any_of` / `all_of` | logical combinators (nestable) |
+
+Snapshot reconstruction replays each tracker's event stream
+(labeled / unlabeled / closed / reopened) chronologically from
+`{labels: [], state: OPEN}` at `createdAt`, evaluated at the
+bucket-end timestamp (Mar 31 / Jun 30 / Sep 30 / Dec 31 at 23:59:59 UTC
+for quarterly; calendar-month last day for monthly).
+
+### Time-to-triage signal
+
+First tracker comment whose author is on the roster (from
+`fetch_roster.py`) AND whose body matches any
+`triage.keywords[]` regex (case-insensitive). Falls back to
+the **first non-bot roster comment** when no keyword matches
+(useful for older trackers that predate the team's triage-comment
+convention). The `triage.bot_prefixes[]` list skips automated
+rollup / sync / import comments.
+
+### Milestones (vertical annotations)
+
+`milestones[]` produces a vertical dashed line + top-label annotation
+on every time-axis chart. Each entry needs `date: YYYY-MM-DD` (mapped
+onto the bucket axis) and `label`. Set `milestones: []` in an overlay
+to remove them entirely.
+
+### When `upstream_repo` is null
+
+The `c_prc` / `c_prm` / `c_rel` PR-driven mean-time charts are
+omitted, the `fetch_prs.py` stage is a silent no-op, and the
+`pr_merged_by_snapshot` predicate is always false (so the
+`open_pr_merged` snapshot back-fill rule is disabled). The
+remaining charts still render.
+
+## Prerequisites
+
+- `gh` authenticated with read access to `<tracker>` (and to
+  `<upstream>` for PR metadata, when configured).
+- `python3` (3.9+).
+- `jq` (only used by the fetch scripts via gh's `--jq` flag).
+- Network access to `api.github.com` and (for viewing) Plotly's CDN.
+- Optional: `pyyaml`. When missing, `render.py` falls back to a
+  bundled minimal YAML subset parser sufficient for
+  `default-config.yaml` and typical overlays. To pin a clean PyYAML
+  invocation, set `TRACKER_STATS_PY=uv-yaml` and the orchestrator
+  runs every step under `uv run --with pyyaml`.
+
+## Failure modes
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| `events/<N>.json` missing for some N | gh transient failure during paginate 
| Re-run `run.sh`; `fetch_events.py` resumes from cache |
+| `prs.json` has `{"error": ...}` entries | False-positive body parse (PR# 
doesn't exist) | Silently filtered at render; safe to ignore |
+| `c_rel` median jumps after re-fetch | New advisory shipped since last run | 
Expected — re-render is correct |
+| Empty `c_prc` / `c_prm` / `c_rel` early buckets | No linked PR in those 
tracker buckets | Expected — not all early trackers had a fix PR |
+| `ModuleNotFoundError: yaml` | PyYAML missing | The bundled fallback parser 
handles `default-config.yaml`; for richer overlays install pyyaml or use 
`TRACKER_STATS_PY=uv-yaml` |
diff --git a/tools/security-tracker-stats-dashboard/default-config.yaml 
b/tools/security-tracker-stats-dashboard/default-config.yaml
new file mode 100644
index 0000000..12edaad
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/default-config.yaml
@@ -0,0 +1,135 @@
+# Default configuration for security-tracker-stats-dashboard.
+#
+# All knobs here can be overridden either by a YAML file pointed at by
+# the `TRACKER_STATS_CONFIG` env var (deep-merged with this default; the
+# `milestones` and `categories` lists are REPLACED entirely, not
+# concatenated), or by env-var quick overrides for the most common knobs
+# (`TRACKER_STATS_BUCKETS`, `TRACKER_STATS_START`, 
`TRACKER_STATS_UPSTREAM_REPO`).
+#
+# Defaults below match the reference `airflow-s/airflow-s` dashboard
+# byte-for-byte.
+
+buckets: monthly                # monthly | quarterly
+start: null                     # null = first tracker createdAt; else 
"YYYY-MM" (monthly) or "YYYY-Qn" (quarterly)
+upstream_repo: apache/airflow   # null -> skip c_prc/c_prm/c_rel charts and 
the back-fill rule
+
+milestones:
+  - date: 2026-04-20
+    label: skill adoption
+
+scope_labels: [airflow, providers, chart]
+
+# Categories - evaluated top-to-bottom, FIRST MATCH WINS. Multiple rules
+# can share the same `name` (and `color`) to express disjoint branches
+# of the same final category. The set of distinct names defines the
+# stacked-band order in the dashboard's lifecycle chart (preserved in
+# the order they FIRST appear in this list).
+#
+# Each predicate is conjunctive: ALL conditions must match. Supported keys:
+#   state                       open | closed
+#   state_reason                COMPLETED | NOT_PLANNED | REOPENED | null
+#   any_label                   list - at least one of these labels present
+#   all_labels                  list - every label in this list present
+#   not_label                   single label - must NOT be present
+#   not_any_label               list - none of these labels present
+#   no_scope_label              true  - tracker has none of the scope_labels
+#   has_scope_label             true  - tracker has at least one scope_label
+#   pr_merged_by_snapshot       true  - a linked upstream PR is merged at 
snapshot time
+# Logical combinators: any_of / all_of (nest as deep as you need).
+categories:
+  # --- Closed branch (mirrors `if not is_open:` in the reference). ----
+  - name: fixed_released
+    color: "#2ca02c"
+    predicate:
+      all_of:
+        state: closed
+        any_of:
+          - any_label: [fix released, "announced - emails sent", announced]
+          - all_of:
+              state_reason: COMPLETED
+              any_label: [cve allocated]
+  - name: closed_other
+    color: "#888888"
+    predicate:
+      state: closed
+
+  # --- Open branch (mirrors the reference's open-branch order). -------
+  - name: open_untriaged
+    color: "#d62728"
+    predicate:
+      all_of:
+        state: open
+        any_of:
+          - any_label: [needs triage]
+          - no_scope_label: true
+  # PR-merge-by-snapshot back-fill: an upstream PR has merged by the
+  # snapshot timestamp. Captures historical trackers that predate the
+  # `pr merged` label convention.
+  - name: open_pr_merged
+    color: "#e67e22"
+    predicate:
+      all_of:
+        state: open
+        pr_merged_by_snapshot: true
+        not_label: fix released
+  - name: fixed_released
+    color: "#2ca02c"
+    predicate:
+      all_of:
+        state: open
+        pr_merged_by_snapshot: true
+        any_label: [fix released]
+  - name: open_pr_merged
+    color: "#e67e22"
+    predicate:
+      all_of:
+        state: open
+        any_label: [pr merged]
+        not_label: fix released
+  - name: fixed_released
+    color: "#2ca02c"
+    predicate:
+      all_of:
+        state: open
+        any_label: [fix released, "announced - emails sent", announced]
+  - name: open_triaged
+    color: "#f1c40f"
+    predicate:
+      state: open
+
+# The order in which distinct category names FIRST appear above is the
+# stacked-band order top-to-bottom in the lifecycle chart. For the
+# defaults that resolves to: fixed_released, closed_other,
+# open_untriaged, open_pr_merged, open_triaged. The reference dashboard
+# uses a different stack order, however, so we re-pin it here:
+stack_order:
+  - fixed_released
+  - open_pr_merged
+  - open_triaged
+  - open_untriaged
+  - closed_other
+
+triage:
+  keywords:
+    - triage proposal
+    - proposed disposition
+    - VALID
+    - INVALID
+    - DEFENSE-IN-DEPTH
+    - INFO-ONLY
+    - PROBABLE-DUP
+    - looks like a valid security issue
+    - not a security issue
+    - not a vulnerability
+    - out of scope
+    - out-of-scope
+    - agreed
+    - Security Model
+    - cve-worthy
+    - CVE-worthy
+  bot_prefixes:
+    - "<!-- airflow-s status rollup v"
+    - "**Sync "
+    - "**Imported on "
+    - "**Status update"
+    - "**Allocated CVE"
diff --git a/tools/security-tracker-stats-dashboard/fetch_bodies.py 
b/tools/security-tracker-stats-dashboard/fetch_bodies.py
new file mode 100644
index 0000000..dccfd6b
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_bodies.py
@@ -0,0 +1,56 @@
+#!/usr/bin/env python3
+"""Fetch issue body + closedByPullRequestsReferences for every tracker
+issue and cache to /tmp/claude/dashboard/issue_extra.json."""
+
+import json
+import os
+import subprocess
+from concurrent.futures import ThreadPoolExecutor, as_completed
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+REPO = os.environ.get('TRACKER_STATS_REPO', 'airflow-s/airflow-s')
+OUT = f'{ROOT}/issue_extra.json'
+
+with open(f'{ROOT}/issues.json') as f:
+    issues = json.load(f)
+
+# Resume support
+cache = {}
+if os.path.exists(OUT):
+    with open(OUT) as f:
+        cache = json.load(f)
+    print(f"resume: {len(cache)} cached")
+
+todo = [i['number'] for i in issues if str(i['number']) not in cache]
+print(f"to fetch: {len(todo)}")
+
+
+def fetch(n):
+    try:
+        r = subprocess.run(
+            ['gh', 'issue', 'view', str(n), '--repo', REPO,
+             '--json', 'number,body,closedByPullRequestsReferences'],
+            capture_output=True, text=True, timeout=60,
+        )
+        if r.returncode != 0:
+            return n, {'error': r.stderr.strip()}
+        return n, json.loads(r.stdout)
+    except Exception as e:
+        return n, {'error': str(e)}
+
+
+done = 0
+with ThreadPoolExecutor(max_workers=10) as ex:
+    futs = {ex.submit(fetch, n): n for n in todo}
+    for fut in as_completed(futs):
+        n, data = fut.result()
+        cache[str(n)] = data
+        done += 1
+        if done % 25 == 0:
+            with open(OUT, 'w') as f:
+                json.dump(cache, f)
+            print(f"  {done}/{len(todo)}")
+
+with open(OUT, 'w') as f:
+    json.dump(cache, f)
+print(f"done: cached {len(cache)} → {OUT}")
diff --git a/tools/security-tracker-stats-dashboard/fetch_events.py 
b/tools/security-tracker-stats-dashboard/fetch_events.py
new file mode 100644
index 0000000..a496b67
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_events.py
@@ -0,0 +1,61 @@
+#!/usr/bin/env python3
+"""Fetch per-issue label-history events. Resumes from cache."""
+
+import json
+import subprocess
+import concurrent.futures
+import os
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+REPO = os.environ.get('TRACKER_STATS_REPO', 'airflow-s/airflow-s')
+EVENTS_DIR = f'{ROOT}/events'
+
+with open(f'{ROOT}/issues.json') as f:
+    issues = json.load(f)
+
+numbers = [i['number'] for i in issues]
+print(f"Fetching events for {len(numbers)} issues...")
+
+os.makedirs(EVENTS_DIR, exist_ok=True)
+
+def fetch_one(n):
+    out_path = f'{EVENTS_DIR}/{n}.json'
+    if os.path.exists(out_path) and os.path.getsize(out_path) > 0:
+        return (n, True, 'cached')
+    try:
+        r = subprocess.run(
+            ['gh', 'api', f'repos/{REPO}/issues/{n}/events',
+             '--paginate',
+             '--jq', '[.[] | select(.event == "labeled" or .event == 
"unlabeled" or .event == "closed" or .event == "reopened") | {event, label: 
(.label.name // null), created_at}]'],
+            capture_output=True, text=True, timeout=60
+        )
+        if r.returncode != 0:
+            return (n, False, r.stderr[:200])
+        out = r.stdout.strip()
+        decoder = json.JSONDecoder()
+        idx = 0
+        merged = []
+        while idx < len(out):
+            while idx < len(out) and out[idx] in ' \n\r\t':
+                idx += 1
+            if idx >= len(out):
+                break
+            obj, n2 = decoder.raw_decode(out, idx)
+            merged.extend(obj)
+            idx = n2
+        with open(out_path, 'w') as f:
+            json.dump(merged, f)
+        return (n, True, f'{len(merged)} events')
+    except Exception as e:
+        return (n, False, str(e)[:200])
+
+with concurrent.futures.ThreadPoolExecutor(max_workers=10) as ex:
+    results = list(ex.map(fetch_one, numbers))
+
+ok = sum(1 for _, ok, _ in results if ok)
+fail = [(n, msg) for n, ok, msg in results if not ok]
+print(f"Done: {ok}/{len(numbers)} OK")
+if fail:
+    print("FAILURES:")
+    for n, msg in fail[:20]:
+        print(f"  #{n}: {msg}")
diff --git a/tools/security-tracker-stats-dashboard/fetch_issues.py 
b/tools/security-tracker-stats-dashboard/fetch_issues.py
new file mode 100644
index 0000000..e1587ce
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_issues.py
@@ -0,0 +1,26 @@
+#!/usr/bin/env python3
+"""Dump all tracker issues (state=all, no PRs) to <cache>/issues.json."""
+
+import json
+import os
+import subprocess
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+REPO = os.environ.get('TRACKER_STATS_REPO', 'airflow-s/airflow-s')
+
+os.makedirs(ROOT, exist_ok=True)
+
+print(f"Fetching issue list from {REPO} (state=all, limit 1000) ...")
+r = subprocess.run(
+    ['gh', 'issue', 'list', '--repo', REPO, '--state', 'all', '--limit', 
'1000',
+     '--json', 
'number,title,state,stateReason,createdAt,closedAt,labels,comments'],
+    capture_output=True, text=True, timeout=300,
+)
+if r.returncode != 0:
+    raise SystemExit(f"gh failed: {r.stderr}")
+
+issues = json.loads(r.stdout)
+with open(f'{ROOT}/issues.json', 'w') as f:
+    json.dump(issues, f)
+
+print(f"Wrote {len(issues)} issues to {ROOT}/issues.json")
diff --git a/tools/security-tracker-stats-dashboard/fetch_prs.py 
b/tools/security-tracker-stats-dashboard/fetch_prs.py
new file mode 100644
index 0000000..8ed7ca9
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_prs.py
@@ -0,0 +1,103 @@
+#!/usr/bin/env python3
+"""Fetch createdAt + mergedAt + state for every upstream-repo PR referenced
+by any tracker (via closedByPullRequestsReferences or body parse). Cache to
+`<TRACKER_STATS_CACHE>/prs.json`.
+
+The upstream repo is `$TRACKER_STATS_UPSTREAM_REPO` (default
+`apache/airflow`); set to `none` / `""` to skip this fetch entirely."""
+
+import json
+import os
+import re
+import subprocess
+from concurrent.futures import ThreadPoolExecutor, as_completed
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+UPSTREAM = os.environ.get('TRACKER_STATS_UPSTREAM_REPO', 'apache/airflow')
+if UPSTREAM in ('', 'none', 'null'):
+    print('TRACKER_STATS_UPSTREAM_REPO is empty/none - skipping PR fetch.')
+    raise SystemExit(0)
+
+EXTRA = f'{ROOT}/issue_extra.json'
+OUT = f'{ROOT}/prs.json'
+
+with open(EXTRA) as f:
+    extra = json.load(f)
+
+PR_PAT = re.compile(
+    
rf'{re.escape(UPSTREAM)}#(\d+)|https://github\.com/{re.escape(UPSTREAM)}/pull/(\d+)',
+    re.I,
+)
+
+
+def extract_prs(v):
+    nums = set()
+    cb = v.get('closedByPullRequestsReferences') or []
+    for ref in cb:
+        if ref.get('repository', {}).get('nameWithOwner') == UPSTREAM:
+            nums.add(ref['number'])
+    body = v.get('body') or ''
+    # Only parse the "PR with the fix" field portion if we can find it,
+    # but also accept apache/airflow PR mentions anywhere in the body
+    # (the spec allows either).
+    for m in PR_PAT.findall(body):
+        n = m[0] or m[1]
+        if n:
+            nums.add(int(n))
+    return nums
+
+
+# Build issue -> PR set + collect all unique PRs
+issue_to_prs = {}
+all_prs = set()
+for issue_n, v in extra.items():
+    prs = extract_prs(v)
+    issue_to_prs[issue_n] = sorted(prs)
+    all_prs.update(prs)
+
+# Save the issue_to_prs linkage map alongside
+with open(f'{ROOT}/issue_to_prs.json', 'w') as f:
+    json.dump(issue_to_prs, f)
+print(f"unique {UPSTREAM} PRs to fetch: {len(all_prs)}")
+
+# Resume support
+cache = {}
+if os.path.exists(OUT):
+    with open(OUT) as f:
+        cache = json.load(f)
+    print(f"resume: {len(cache)} cached")
+
+todo = [n for n in all_prs if str(n) not in cache]
+print(f"to fetch: {len(todo)}")
+
+
+def fetch(n):
+    try:
+        r = subprocess.run(
+            ['gh', 'pr', 'view', str(n), '--repo', UPSTREAM,
+             '--json', 'number,createdAt,mergedAt,state'],
+            capture_output=True, text=True, timeout=60,
+        )
+        if r.returncode != 0:
+            return n, {'error': r.stderr.strip()}
+        return n, json.loads(r.stdout)
+    except Exception as e:
+        return n, {'error': str(e)}
+
+
+done = 0
+with ThreadPoolExecutor(max_workers=12) as ex:
+    futs = {ex.submit(fetch, n): n for n in todo}
+    for fut in as_completed(futs):
+        n, data = fut.result()
+        cache[str(n)] = data
+        done += 1
+        if done % 25 == 0:
+            with open(OUT, 'w') as f:
+                json.dump(cache, f)
+            print(f"  {done}/{len(todo)}")
+
+with open(OUT, 'w') as f:
+    json.dump(cache, f)
+errs = sum(1 for v in cache.values() if 'error' in v)
+print(f"done: cached {len(cache)} PRs ({errs} errors) → {OUT}")
diff --git a/tools/security-tracker-stats-dashboard/fetch_roster.py 
b/tools/security-tracker-stats-dashboard/fetch_roster.py
new file mode 100644
index 0000000..2ef0f66
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_roster.py
@@ -0,0 +1,24 @@
+#!/usr/bin/env python3
+"""Dump the security-team roster (tracker repo's collaborators) to 
<cache>/roster.txt."""
+
+import os
+import subprocess
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+REPO = os.environ.get('TRACKER_STATS_REPO', 'airflow-s/airflow-s')
+
+os.makedirs(ROOT, exist_ok=True)
+
+r = subprocess.run(
+    ['gh', 'api', f'repos/{REPO}/collaborators', '--jq', '.[].login', 
'--paginate'],
+    capture_output=True, text=True, timeout=60,
+)
+if r.returncode != 0:
+    raise SystemExit(f"gh failed: {r.stderr}")
+
+logins = [ln.strip() for ln in r.stdout.splitlines() if ln.strip()]
+with open(f'{ROOT}/roster.txt', 'w') as f:
+    for ln in sorted(set(logins)):
+        f.write(ln + '\n')
+
+print(f"Wrote {len(set(logins))} roster handles to {ROOT}/roster.txt")
diff --git a/tools/security-tracker-stats-dashboard/render.py 
b/tools/security-tracker-stats-dashboard/render.py
new file mode 100644
index 0000000..ccf971f
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/render.py
@@ -0,0 +1,1121 @@
+#!/usr/bin/env python3
+"""
+Regenerate a tracker-stats dashboard. Reads cached issues+events+PR data
+from `$TRACKER_STATS_CACHE` (default `/tmp/tracker-stats-cache`) and writes
+a self-contained HTML page to `$TRACKER_STATS_OUT`.
+
+Configuration is loaded from `scripts/default-config.yaml`, optionally
+overlaid by a YAML file at `$TRACKER_STATS_CONFIG` (deep-merged; the
+`milestones` and `categories` lists are REPLACED entirely, not
+concatenated), then overlaid by these env-var quick overrides:
+
+    TRACKER_STATS_BUCKETS         monthly | quarterly
+    TRACKER_STATS_START           "YYYY-MM" (monthly) or "YYYY-Qn" (quarterly)
+    TRACKER_STATS_UPSTREAM_REPO   upstream repo slug (or "" / "none" to skip 
PR charts)
+    TRACKER_STATS_REPO            tracker repo slug (operational)
+    TRACKER_STATS_OUT             output path
+    TRACKER_STATS_CACHE           cache dir
+    TRACKER_STATS_CONFIG          path to a YAML overlay file
+
+Defaults match the reference `airflow-s/airflow-s` dashboard byte-for-byte.
+
+Mean-time charts (createdAt -> PR opened, PR opened -> PR merged, PR merged
+-> advisory announced) use real PR timestamps from the configured upstream
+repo, not the `pr created` / `pr merged` label-add events (which were only
+adopted in late 2025 and erased pre-2026 history). When `upstream_repo` is
+null, those three charts are omitted and the snapshot back-fill rule is
+disabled.
+"""
+
+import calendar
+import json
+import os
+import re
+import statistics
+import datetime as dt
+from collections import defaultdict
+
+# --- YAML loader ----------------------------------------------------
+# Prefer pyyaml when available (handles every edge case). When it's not
+# installed, fall back to a tiny subset parser that covers the schema in
+# default-config.yaml only.
+try:
+    import yaml  # type: ignore
+
+    def yaml_load(text):
+        return yaml.safe_load(text)
+
+except ImportError:
+    def yaml_load(text):
+        return _minimal_yaml_load(text)
+
+
+def _minimal_yaml_load(text):
+    """Tiny YAML subset parser sufficient for default-config.yaml.
+
+    Supports: nested block mappings, block sequences (`- ...`), inline
+    flow lists `[a, b, "c d"]`, string scalars (with optional quotes),
+    integers, floats, booleans, null. Comments start at `#` outside of
+    quoted strings. No anchors, no merge keys, no flow mappings.
+    """
+    lines = []
+    for raw in text.splitlines():
+        # Strip comments outside of quotes.
+        in_q = None
+        out = []
+        i = 0
+        while i < len(raw):
+            ch = raw[i]
+            if in_q:
+                out.append(ch)
+                if ch == '\\' and i + 1 < len(raw):
+                    out.append(raw[i + 1])
+                    i += 2
+                    continue
+                if ch == in_q:
+                    in_q = None
+                i += 1
+                continue
+            if ch in ('"', "'"):
+                in_q = ch
+                out.append(ch)
+                i += 1
+                continue
+            if ch == '#':
+                break
+            out.append(ch)
+            i += 1
+        line = ''.join(out).rstrip()
+        if line.strip():
+            lines.append(line)
+
+    # Parse using indentation stack.
+    def indent_of(s):
+        return len(s) - len(s.lstrip(' '))
+
+    def scalar(s):
+        s = s.strip()
+        if not s:
+            return None
+        if s.lower() in ('null', '~'):
+            return None
+        if s.lower() == 'true':
+            return True
+        if s.lower() == 'false':
+            return False
+        if s.startswith('"') and s.endswith('"') and len(s) >= 2:
+            return s[1:-1].encode().decode('unicode_escape')
+        if s.startswith("'") and s.endswith("'") and len(s) >= 2:
+            return s[1:-1]
+        if s.startswith('[') and s.endswith(']'):
+            inner = s[1:-1].strip()
+            if not inner:
+                return []
+            return [scalar(x) for x in _split_flow_list(inner)]
+        try:
+            if '.' in s or 'e' in s or 'E' in s:
+                return float(s)
+            return int(s)
+        except ValueError:
+            return s
+
+    def _split_flow_list(inner):
+        parts = []
+        cur = []
+        in_q = None
+        depth = 0
+        for ch in inner:
+            if in_q:
+                cur.append(ch)
+                if ch == in_q:
+                    in_q = None
+                continue
+            if ch in ('"', "'"):
+                in_q = ch
+                cur.append(ch)
+                continue
+            if ch == '[':
+                depth += 1
+                cur.append(ch)
+                continue
+            if ch == ']':
+                depth -= 1
+                cur.append(ch)
+                continue
+            if ch == ',' and depth == 0:
+                parts.append(''.join(cur).strip())
+                cur = []
+                continue
+            cur.append(ch)
+        if cur:
+            parts.append(''.join(cur).strip())
+        return parts
+
+    def parse_block(idx, base_indent):
+        # Returns (value, next_idx). Inspects the first non-empty line
+        # at >= base_indent to decide mapping vs. sequence.
+        if idx >= len(lines):
+            return None, idx
+        first = lines[idx]
+        ind = indent_of(first)
+        if ind < base_indent:
+            return None, idx
+        if first.lstrip().startswith('- '):
+            return parse_seq(idx, ind)
+        return parse_map(idx, ind)
+
+    def parse_map(idx, base_indent):
+        out = {}
+        while idx < len(lines):
+            line = lines[idx]
+            ind = indent_of(line)
+            if ind < base_indent:
+                break
+            if ind > base_indent:
+                # Shouldn't happen at top of map.
+                break
+            stripped = line.strip()
+            if stripped.startswith('- '):
+                break
+            # key: value or key:
+            if ':' not in stripped:
+                idx += 1
+                continue
+            # Split on the first ':' that isn't inside quotes.
+            key, _, rest = _split_key_value(stripped)
+            rest = rest.strip()
+            idx += 1
+            if rest == '' or rest is None:
+                # Block child.
+                if idx < len(lines) and indent_of(lines[idx]) > base_indent:
+                    child, idx = parse_block(idx, indent_of(lines[idx]))
+                    out[key] = child
+                else:
+                    out[key] = None
+            else:
+                out[key] = scalar(rest)
+        return out, idx
+
+    def _split_key_value(stripped):
+        in_q = None
+        for i, ch in enumerate(stripped):
+            if in_q:
+                if ch == in_q:
+                    in_q = None
+                continue
+            if ch in ('"', "'"):
+                in_q = ch
+                continue
+            if ch == ':':
+                key = stripped[:i].strip()
+                rest = stripped[i + 1 :]
+                # Unquote key.
+                if (key.startswith('"') and key.endswith('"')) or (
+                    key.startswith("'") and key.endswith("'")
+                ):
+                    key = key[1:-1]
+                return key, ':', rest
+        return stripped, None, ''
+
+    def parse_seq(idx, base_indent):
+        out = []
+        while idx < len(lines):
+            line = lines[idx]
+            ind = indent_of(line)
+            if ind < base_indent:
+                break
+            if ind > base_indent:
+                break
+            stripped = line.strip()
+            if not stripped.startswith('- '):
+                break
+            after_dash = stripped[2:].rstrip()
+            # Item indent = base_indent + 2 (for "- ")
+            item_inner_indent = base_indent + 2
+            idx += 1
+            if after_dash == '':
+                # Block item, child lines.
+                if idx < len(lines) and indent_of(lines[idx]) > base_indent:
+                    child, idx = parse_block(idx, indent_of(lines[idx]))
+                    out.append(child)
+                else:
+                    out.append(None)
+                continue
+            if ':' in after_dash and not (
+                after_dash.startswith('"') or after_dash.startswith("'")
+            ):
+                # Inline first key-value of a mapping item. Treat the "- "
+                # as introducing a mapping whose first key is on this line.
+                key, _, rest = _split_key_value(after_dash)
+                rest = rest.strip()
+                item = {}
+                if rest == '':
+                    if idx < len(lines) and indent_of(lines[idx]) > 
item_inner_indent:
+                        child, idx = parse_block(idx, indent_of(lines[idx]))
+                        item[key] = child
+                    else:
+                        item[key] = None
+                else:
+                    item[key] = scalar(rest)
+                # Continue absorbing further keys at item_inner_indent.
+                while idx < len(lines):
+                    nline = lines[idx]
+                    nind = indent_of(nline)
+                    if nind < item_inner_indent:
+                        break
+                    if nind > item_inner_indent:
+                        break
+                    nstripped = nline.strip()
+                    if nstripped.startswith('- '):
+                        break
+                    if ':' not in nstripped:
+                        idx += 1
+                        continue
+                    nkey, _, nrest = _split_key_value(nstripped)
+                    nrest = nrest.strip()
+                    idx += 1
+                    if nrest == '':
+                        if idx < len(lines) and indent_of(lines[idx]) > 
item_inner_indent:
+                            child, idx = parse_block(idx, 
indent_of(lines[idx]))
+                            item[nkey] = child
+                        else:
+                            item[nkey] = None
+                    else:
+                        item[nkey] = scalar(nrest)
+                out.append(item)
+            else:
+                out.append(scalar(after_dash))
+        return out, idx
+
+    val, _ = parse_block(0, 0)
+    return val
+
+
+# --- Config loading -------------------------------------------------
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+OUT_PATH = os.environ.get('TRACKER_STATS_OUT', '/tmp/airflow_s_monthly.html')
+HERE = os.path.dirname(os.path.abspath(__file__))
+DEFAULT_CONFIG_PATH = os.path.join(HERE, 'default-config.yaml')
+
+
+def deep_merge(base, overlay):
+    """Deep-merge overlay into base. Lists are REPLACED (not concatenated)."""
+    if overlay is None:
+        return base
+    if not isinstance(base, dict) or not isinstance(overlay, dict):
+        return overlay
+    out = dict(base)
+    for k, v in overlay.items():
+        if k in out and isinstance(out[k], dict) and isinstance(v, dict):
+            out[k] = deep_merge(out[k], v)
+        else:
+            out[k] = v
+    return out
+
+
+def load_config():
+    with open(DEFAULT_CONFIG_PATH) as f:
+        cfg = yaml_load(f.read()) or {}
+    overlay_path = os.environ.get('TRACKER_STATS_CONFIG')
+    if overlay_path and os.path.exists(overlay_path):
+        with open(overlay_path) as f:
+            overlay = yaml_load(f.read()) or {}
+        cfg = deep_merge(cfg, overlay)
+    # Env-var quick overrides.
+    if os.environ.get('TRACKER_STATS_BUCKETS'):
+        cfg['buckets'] = os.environ['TRACKER_STATS_BUCKETS']
+    if 'TRACKER_STATS_START' in os.environ:
+        v = os.environ['TRACKER_STATS_START']
+        cfg['start'] = v if v else None
+    if 'TRACKER_STATS_UPSTREAM_REPO' in os.environ:
+        v = os.environ['TRACKER_STATS_UPSTREAM_REPO']
+        cfg['upstream_repo'] = None if v in ('', 'none', 'null') else v
+    return cfg
+
+
+CONFIG = load_config()
+
+BUCKETS_MODE = CONFIG.get('buckets', 'monthly')
+if BUCKETS_MODE not in ('monthly', 'quarterly'):
+    raise SystemExit(f"buckets must be 'monthly' or 'quarterly', got 
{BUCKETS_MODE!r}")
+
+START_OVERRIDE = CONFIG.get('start')
+UPSTREAM_REPO = CONFIG.get('upstream_repo')
+SCOPE_LABELS = set(CONFIG.get('scope_labels') or [])
+MILESTONES = CONFIG.get('milestones') or []
+CATEGORIES_CFG = CONFIG.get('categories') or []
+TRIAGE_KW = CONFIG.get('triage', {}).get('keywords') or []
+BOT_PREFIXES = tuple(CONFIG.get('triage', {}).get('bot_prefixes') or [])
+
+# Distinct category names in the order they FIRST appear in CATEGORIES_CFG
+# (multiple rules can share a name to express disjoint branches of the
+# same final category).
+_seen = set()
+CATS_DEFAULT_ORDER = []
+for c in CATEGORIES_CFG:
+    if c['name'] not in _seen:
+        _seen.add(c['name'])
+        CATS_DEFAULT_ORDER.append(c['name'])
+STACK_ORDER = CONFIG.get('stack_order') or CATS_DEFAULT_ORDER
+# CATS used for snapshot counting is the distinct-name set. Plotting uses
+# STACK_ORDER (which may re-order them for visual layering).
+CATS = list(CATS_DEFAULT_ORDER)
+CAT_COLORS = {}
+for c in CATEGORIES_CFG:
+    CAT_COLORS.setdefault(c['name'], c.get('color', '#888888'))
+
+
+# --- Cache load -----------------------------------------------------
+
+with open(f'{ROOT}/issues.json') as f:
+    issues = json.load(f)
+with open(f'{ROOT}/roster.txt') as f:
+    roster = {ln.strip() for ln in f if ln.strip()}
+with open(f'{ROOT}/issue_extra.json') as f:
+    issue_extra = json.load(f)
+
+prs_cache = {}
+if UPSTREAM_REPO:
+    prs_path = f'{ROOT}/prs.json'
+    if os.path.exists(prs_path):
+        with open(prs_path) as f:
+            prs_cache = json.load(f)
+
+NOW = dt.datetime(2026, 5, 21, 0, 0, 0, tzinfo=dt.timezone.utc)
+
+if UPSTREAM_REPO:
+    # Match the original literal in the body-parse regex so an upstream
+    # of `apache/airflow` still matches the historical pre-existing
+    # `apache/airflow#NNN` references byte-for-byte.
+    repo_re = re.escape(UPSTREAM_REPO)
+    PR_PAT = re.compile(
+        rf'{repo_re}#(\d+)|https://github\.com/{repo_re}/pull/(\d+)', re.I
+    )
+else:
+    PR_PAT = None
+
+
+# --- helpers --------------------------------------------------------
+
+def parse_dt(s):
+    if not s:
+        return None
+    return dt.datetime.fromisoformat(s.replace('Z', '+00:00'))
+
+
+# --- Bucket abstraction --------------------------------------------
+
+def month_of(d):
+    return d.year, d.month
+
+
+def quarter_of(d):
+    return d.year, (d.month - 1) // 3 + 1
+
+
+def month_label(y, m):
+    return f"{y}-{m:02d}"
+
+
+def quarter_label(y, q):
+    return f"{y}-Q{q}"
+
+
+def month_end(y, m):
+    last_day = calendar.monthrange(y, m)[1]
+    return dt.datetime(y, m, last_day, 23, 59, 59, tzinfo=dt.timezone.utc)
+
+
+def quarter_end(y, q):
+    # q in {1,2,3,4}
+    last_month = q * 3
+    last_day = calendar.monthrange(y, last_month)[1]
+    return dt.datetime(y, last_month, last_day, 23, 59, 59, 
tzinfo=dt.timezone.utc)
+
+
+def iter_months(y0, m0, y1, m1):
+    y, m = y0, m0
+    while (y, m) <= (y1, m1):
+        yield y, m
+        m += 1
+        if m == 13:
+            m = 1
+            y += 1
+
+
+def iter_quarters(y0, q0, y1, q1):
+    y, q = y0, q0
+    while (y, q) <= (y1, q1):
+        yield y, q
+        q += 1
+        if q == 5:
+            q = 1
+            y += 1
+
+
+if BUCKETS_MODE == 'monthly':
+    bucket_of = month_of
+    bucket_label = month_label
+    bucket_end = month_end
+    bucket_iter = iter_months
+else:
+    bucket_of = quarter_of
+    bucket_label = quarter_label
+    bucket_end = quarter_end
+    bucket_iter = iter_quarters
+
+
+# --- index issues + buckets ----------------------------------------
+
+issues_by_n = {i['number']: i for i in issues}
+earliest = min(parse_dt(i['createdAt']) for i in issues)
+
+if START_OVERRIDE:
+    if BUCKETS_MODE == 'monthly':
+        y0, m0 = (int(x) for x in START_OVERRIDE.split('-'))
+        start_key = (y0, m0)
+    else:
+        y_part, q_part = START_OVERRIDE.split('-Q')
+        start_key = (int(y_part), int(q_part))
+else:
+    start_key = bucket_of(earliest)
+
+end_key = bucket_of(NOW)
+buckets = list(bucket_iter(start_key[0], start_key[1], end_key[0], end_key[1]))
+bucket_labels = [bucket_label(*b) for b in buckets]
+n_buckets = len(buckets)
+
+print(f"earliest createdAt: {earliest.isoformat()} -> starts at 
{bucket_label(*start_key)}")
+print(f"now: {NOW.isoformat()} -> ends at {bucket_label(*end_key)}")
+print(f"buckets in range ({BUCKETS_MODE}): {n_buckets}")
+
+# Per-issue events
+events_by_n = {}
+for n in issues_by_n:
+    p = f'{ROOT}/events/{n}.json'
+    if os.path.exists(p) and os.path.getsize(p) > 0:
+        with open(p) as f:
+            events_by_n[n] = json.load(f)
+    else:
+        events_by_n[n] = []
+
+
+# --- tracker -> linked PR list (from body parse + closedBy) --------
+
+def extract_prs_for_issue(n):
+    if not UPSTREAM_REPO:
+        return set()
+    v = issue_extra.get(str(n)) or {}
+    nums = set()
+    for ref in (v.get('closedByPullRequestsReferences') or []):
+        if ref.get('repository', {}).get('nameWithOwner') == UPSTREAM_REPO:
+            nums.add(ref['number'])
+    body = v.get('body') or ''
+    if PR_PAT is not None:
+        for m in PR_PAT.findall(body):
+            x = m[0] or m[1]
+            if x:
+                nums.add(int(x))
+    return nums
+
+
+issue_prs = {n: extract_prs_for_issue(n) for n in issues_by_n}
+
+
+def pr_meta(num):
+    """Return dict(createdAt=dt, mergedAt=dt|None, state=str) or None."""
+    v = prs_cache.get(str(num))
+    if not v or 'error' in v:
+        return None
+    return {
+        'createdAt': parse_dt(v.get('createdAt')),
+        'mergedAt': parse_dt(v.get('mergedAt')),
+        'state': v.get('state'),
+    }
+
+
+def tracker_pr_signals(n):
+    earliest_created = None
+    earliest_created_pr = None
+    earliest_merged_ts = None
+    earliest_merged_pr = None
+    for prn in issue_prs.get(n, []):
+        meta = pr_meta(prn)
+        if meta is None:
+            continue
+        c = meta['createdAt']
+        if c is not None:
+            if earliest_created is None or c < earliest_created:
+                earliest_created = c
+                earliest_created_pr = prn
+        mt = meta['mergedAt']
+        if mt is not None:
+            if earliest_merged_ts is None or mt < earliest_merged_ts:
+                earliest_merged_ts = mt
+                earliest_merged_pr = prn
+    return {
+        'first_pr_created': earliest_created,
+        'first_pr_created_num': earliest_created_pr,
+        'first_pr_merged': earliest_merged_ts,
+        'first_pr_merged_num': earliest_merged_pr,
+    }
+
+
+tracker_signals = {n: tracker_pr_signals(n) for n in issues_by_n}
+
+
+# --- label timeline replay ------------------------------------------
+
+def labels_open_at(issue, ts):
+    n = issue['number']
+    created = parse_dt(issue['createdAt'])
+    if ts < created:
+        return None, None
+    labels = set()
+    is_open = True
+    for e in events_by_n.get(n, []):
+        et = parse_dt(e['created_at'])
+        if et > ts:
+            break
+        if e['event'] == 'labeled' and e.get('label'):
+            labels.add(e['label'])
+        elif e['event'] == 'unlabeled' and e.get('label'):
+            labels.discard(e['label'])
+        elif e['event'] == 'closed':
+            is_open = False
+        elif e['event'] == 'reopened':
+            is_open = True
+    return labels, is_open
+
+
+# --- Predicate evaluator -------------------------------------------
+
+def eval_predicate(pred, ctx):
+    """Evaluate a category predicate against a snapshot context.
+
+    `ctx` keys:
+        labels (set), is_open (bool), state_reason (str|None),
+        pr_merged_by_snapshot (bool).
+    """
+    if not isinstance(pred, dict):
+        return False
+    for key, val in pred.items():
+        if key == 'any_of':
+            if not any(eval_predicate(p, ctx) for p in val):
+                return False
+        elif key == 'all_of':
+            if isinstance(val, list):
+                if not all(eval_predicate(p, ctx) for p in val):
+                    return False
+            elif isinstance(val, dict):
+                if not eval_predicate(val, ctx):
+                    return False
+            else:
+                return False
+        elif key == 'state':
+            want_open = (val == 'open')
+            if ctx['is_open'] != want_open:
+                return False
+        elif key == 'state_reason':
+            if ctx['state_reason'] != val:
+                return False
+        elif key == 'any_label':
+            if not any(l in ctx['labels'] for l in val):
+                return False
+        elif key == 'all_labels':
+            if not all(l in ctx['labels'] for l in val):
+                return False
+        elif key == 'not_label':
+            if val in ctx['labels']:
+                return False
+        elif key == 'not_any_label':
+            if any(l in ctx['labels'] for l in val):
+                return False
+        elif key == 'no_scope_label':
+            has_scope = bool(ctx['labels'] & SCOPE_LABELS)
+            if val and has_scope:
+                return False
+            if not val and not has_scope:
+                return False
+        elif key == 'has_scope_label':
+            has_scope = bool(ctx['labels'] & SCOPE_LABELS)
+            if val and not has_scope:
+                return False
+            if not val and has_scope:
+                return False
+        elif key == 'pr_merged_by_snapshot':
+            if val and not ctx['pr_merged_by_snapshot']:
+                return False
+            if not val and ctx['pr_merged_by_snapshot']:
+                return False
+        else:
+            # Unknown key — fail safe.
+            return False
+    return True
+
+
+def classify_per_config(labels, is_open, ts, n):
+    issue = issues_by_n[n]
+    state_reason = issue.get('stateReason')
+    sig = tracker_signals.get(n, {})
+    fm = sig.get('first_pr_merged')
+    pr_merged_by_snapshot = bool(UPSTREAM_REPO and fm is not None and fm <= ts)
+    ctx = {
+        'labels': labels,
+        'is_open': is_open,
+        'state_reason': state_reason,
+        'pr_merged_by_snapshot': pr_merged_by_snapshot,
+    }
+    for cat in CATEGORIES_CFG:
+        if eval_predicate(cat['predicate'], ctx):
+            return cat['name']
+    return None
+
+
+# --- snapshot counts ------------------------------------------------
+
+counts = {cat: [0] * n_buckets for cat in CATS}
+backfill_trackers = set()
+
+for bi, b in enumerate(buckets):
+    be = bucket_end(*b)
+    ts = NOW if be > NOW else be
+    for i in issues:
+        labels, is_open = labels_open_at(i, ts)
+        if labels is None:
+            continue
+        cat = classify_per_config(labels, is_open, ts, i['number'])
+        if cat is None:
+            continue
+        counts[cat][bi] += 1
+
+        if cat == 'open_pr_merged' and is_open and 'pr merged' not in labels:
+            backfill_trackers.add(i['number'])
+
+# --- cumulative opened / closed ------------------------------------
+
+cum_opened = [0] * n_buckets
+cum_closed = [0] * n_buckets
+for bi, b in enumerate(buckets):
+    be = bucket_end(*b)
+    ts = NOW if be > NOW else be
+    op = 0
+    cl = 0
+    for i in issues:
+        ca = parse_dt(i['createdAt'])
+        if ca and ca <= ts:
+            op += 1
+        cz = parse_dt(i.get('closedAt'))
+        if cz and cz <= ts:
+            cl += 1
+    cum_opened[bi] = op
+    cum_closed[bi] = cl
+
+# --- Opened-in-bucket vs untriaged-at-bucket-end ------------------
+
+opened_in_b = [0] * n_buckets
+untriaged_at_bend = counts.get('open_untriaged', [0] * n_buckets)
+
+for i in issues:
+    ca = parse_dt(i['createdAt'])
+    if ca is None:
+        continue
+    cb = bucket_of(ca)
+    if cb < buckets[0] or cb > buckets[-1]:
+        continue
+    bi = buckets.index(cb)
+    opened_in_b[bi] += 1
+
+# --- triage / response ---------------------------------------------
+
+# Build the triage regex from config. Keep word-boundary wrapping for
+# the all-caps keywords so they don't match substrings inside other
+# words (mirrors the original handwritten regex).
+_kw_parts = []
+for kw in TRIAGE_KW:
+    if kw.isupper() and ' ' not in kw and '-' not in kw:
+        _kw_parts.append(rf'\b{re.escape(kw)}\b')
+    elif kw.isalpha() and kw.islower() and ' ' not in kw:
+        _kw_parts.append(rf'\b{re.escape(kw)}\b')
+    else:
+        _kw_parts.append(re.escape(kw))
+TRIAGE_RE = re.compile('|'.join(_kw_parts), re.IGNORECASE) if _kw_parts else 
None
+
+
+def is_bot_body(body):
+    if not body:
+        return False
+    b = body.lstrip()
+    for p in BOT_PREFIXES:
+        if b.startswith(p):
+            return True
+    return False
+
+
+triage_hours_by_b = defaultdict(list)
+resp_hours_by_b = defaultdict(list)
+n_fallback_triage = 0
+n_no_triage = 0
+all_triage_hours = []
+
+for i in issues:
+    created = parse_dt(i['createdAt'])
+    blbl = bucket_label(*bucket_of(created))
+    comments = i.get('comments', []) or []
+
+    first_roster = None
+    first_roster_keyword = None
+    for c in comments:
+        author = (c.get('author') or {}).get('login')
+        if not author or author not in roster:
+            continue
+        if is_bot_body(c.get('body') or ''):
+            continue
+        ct = parse_dt(c['createdAt'])
+        if first_roster is None:
+            first_roster = ct
+        if (
+            first_roster_keyword is None
+            and TRIAGE_RE is not None
+            and TRIAGE_RE.search(c.get('body') or '')
+        ):
+            first_roster_keyword = ct
+        if first_roster is not None and first_roster_keyword is not None:
+            break
+
+    if first_roster is not None:
+        hours = (first_roster - created).total_seconds() / 3600
+        resp_hours_by_b[blbl].append(hours)
+
+    triage_ts = first_roster_keyword if first_roster_keyword is not None else 
first_roster
+    if triage_ts is None:
+        n_no_triage += 1
+        continue
+    if first_roster_keyword is None:
+        n_fallback_triage += 1
+    hours = (triage_ts - created).total_seconds() / 3600
+    triage_hours_by_b[blbl].append(hours)
+    all_triage_hours.append(hours)
+
+
+def mean_or_none(xs):
+    return round(statistics.mean(xs), 2) if xs else None
+
+
+def per_b_series(by_b):
+    ys = []
+    ns = []
+    for b in buckets:
+        lbl = bucket_label(*b)
+        xs = by_b.get(lbl, [])
+        ys.append(mean_or_none(xs))
+        ns.append(len(xs))
+    return ys, ns
+
+
+triage_ys, triage_ns = per_b_series(triage_hours_by_b)
+resp_ys, resp_ns = per_b_series(resp_hours_by_b)
+
+triage_median = round(statistics.median(all_triage_hours), 2) if 
all_triage_hours else None
+triage_mean = round(statistics.mean(all_triage_hours), 2) if all_triage_hours 
else None
+triage_n = len(all_triage_hours)
+
+
+# --- PR-driven mean-time metrics -----------------------------------
+
+prc_by_b = defaultdict(list)
+prm_by_b = defaultdict(list)
+rel_by_b = defaultdict(list)
+
+
+def first_label_time(n, label):
+    for e in events_by_n.get(n, []):
+        if e['event'] == 'labeled' and e.get('label') == label:
+            return parse_dt(e['created_at'])
+    return None
+
+
+if UPSTREAM_REPO:
+    for i in issues:
+        n = i['number']
+        created = parse_dt(i['createdAt'])
+        sig = tracker_signals.get(n, {})
+
+        first_pr_c = sig.get('first_pr_created')
+        first_pr_m = sig.get('first_pr_merged')
+
+        if first_pr_c and created and first_pr_c >= created:
+            days = (first_pr_c - created).total_seconds() / 86400
+            prc_by_b[bucket_label(*bucket_of(created))].append(days)
+
+        if first_pr_m is not None:
+            prn = sig.get('first_pr_merged_num')
+            meta = pr_meta(prn) if prn else None
+            if meta and meta['createdAt'] and meta['mergedAt'] and 
meta['mergedAt'] >= meta['createdAt']:
+                days = (meta['mergedAt'] - meta['createdAt']).total_seconds() 
/ 86400
+                
prm_by_b[bucket_label(*bucket_of(meta['createdAt']))].append(days)
+
+        if first_pr_m is not None:
+            announced = (first_label_time(n, 'announced - emails sent')
+                         or first_label_time(n, 'announced'))
+            rel_ts = announced
+            if rel_ts is None:
+                ca = parse_dt(i.get('closedAt'))
+                state_reason = i.get('stateReason')
+                cur_labels = {l['name'] for l in i.get('labels', [])}
+                is_closed_completed = (i.get('state') == 'CLOSED' and 
state_reason == 'COMPLETED')
+                has_cve = 'cve allocated' in cur_labels
+                if ca and is_closed_completed and has_cve:
+                    rel_ts = ca
+            if rel_ts is not None and rel_ts > first_pr_m:
+                days = (rel_ts - first_pr_m).total_seconds() / 86400
+                rel_by_b[bucket_label(*bucket_of(first_pr_m))].append(days)
+
+
+prc_ys, prc_ns = per_b_series(prc_by_b)
+prm_ys, prm_ns = per_b_series(prm_by_b)
+rel_ys, rel_ns = per_b_series(rel_by_b)
+
+
+def n_buckets_with_data(by_b):
+    return sum(1 for k, xs in by_b.items() if xs)
+
+
+def overall_median(by_b):
+    flat = [x for xs in by_b.values() for x in xs]
+    return round(statistics.median(flat), 2) if flat else None
+
+
+def overall_n(by_b):
+    return sum(len(xs) for xs in by_b.values())
+
+
+# --- KPIs ----------------------------------------------------------
+
+total = len(issues)
+open_now = sum(1 for i in issues if i.get('state') == 'OPEN')
+closed_now = total - open_now
+
+def latest(cat):
+    return counts[cat][-1] if cat in counts else 0
+
+
+print(f"total trackers: {total}")
+print(f"open: {open_now}, closed: {closed_now}")
+print(f"fixed_released (latest bucket): {latest('fixed_released')}")
+print(f"open_untriaged: {latest('open_untriaged')}, open_triaged: 
{latest('open_triaged')}, "
+      f"open_pr_merged: {latest('open_pr_merged')}, closed_other: 
{latest('closed_other')}")
+print(f"triage median {triage_median}h, mean {triage_mean}h, n={triage_n} "
+      f"(fallback={n_fallback_triage}, none={n_no_triage})")
+
+if UPSTREAM_REPO:
+    print()
+    print("PR-driven mean-time series:")
+    for name, by_b in [
+        ('c_prc', prc_by_b),
+        ('c_prm', prm_by_b),
+        ('c_rel', rel_by_b),
+    ]:
+        print(f"  {name}: n={overall_n(by_b)} median={overall_median(by_b)} "
+              f"buckets_with_data={n_buckets_with_data(by_b)}")
+
+print()
+print(f"open_pr_merged back-fill: {len(backfill_trackers)} trackers were 
re-classified "
+      f"from open_triaged -> open_pr_merged in at least one historical bucket "
+      f"because of the PR-merge-date rule")
+print()
+print(f"Latest bucket ({bucket_labels[-1]}) opened-vs-untriaged: "
+      f"opened_in_b={opened_in_b[-1]}, 
untriaged_at_bend={untriaged_at_bend[-1]}")
+
+
+# --- Render HTML ---------------------------------------------------
+
+def js_array(xs, fmt_null='null'):
+    parts = []
+    for x in xs:
+        if x is None:
+            parts.append(fmt_null)
+        elif isinstance(x, float):
+            parts.append(f"{x:.2f}" if not (x == int(x)) else f"{int(x)}")
+        else:
+            parts.append(str(x))
+    return '[' + ', '.join(parts) + ']'
+
+
+def js_quotes(xs):
+    return '[' + ', '.join(f'"{x}"' for x in xs) + ']'
+
+
+def milestone_x(milestone_date):
+    """Map a milestone date (YYYY-MM-DD) onto a bucket-axis label."""
+    y = int(milestone_date[:4])
+    mo = int(milestone_date[5:7])
+    if BUCKETS_MODE == 'monthly':
+        return f"{y}-{mo:02d}"
+    return f"{y}-Q{(mo - 1) // 3 + 1}"
+
+
+# Title prefix differs between bucket modes for clarity.
+bucket_word = 'month' if BUCKETS_MODE == 'monthly' else 'quarter'
+
+# Build stacked-band traces in STACK_ORDER. With the default config that
+# resolves to `fixed_released, open_pr_merged, open_triaged,
+# open_untriaged, closed_other` — matching the reference dashboard.
+stacked_traces = []
+for cat in STACK_ORDER:
+    if cat not in counts:
+        continue
+    color = CAT_COLORS.get(cat, '#888888')
+    ys = js_array(counts[cat])
+    stacked_traces.append(
+        f"  {{x: buckets, y: {ys},  name: '{cat}',  stackgroup: 'one', "
+        f"type: 'scatter', mode: 'lines', line: {{color: '{color}', width: 
0}}, "
+        f"fillcolor: '{color}', hoveron: 'points+fills'}}"
+    )
+stacked_block = ',\n'.join(stacked_traces)
+
+# Milestone shapes + annotations (multi-milestone capable).
+ms_shapes = []
+ms_annots = []
+for ms in MILESTONES:
+    ms_date = ms.get('date')
+    ms_label = ms.get('label') or 'milestone'
+    if not ms_date:
+        continue
+    x_val = milestone_x(str(ms_date))
+    ms_shapes.append(
+        "{type: 'line', xref: 'x', yref: 'paper', x0: '" + x_val
+        + "', x1: '" + x_val
+        + "', y0: 0, y1: 1, line: {color: '#888', width: 1.5, dash: 'dash'}}"
+    )
+    ms_annots.append(
+        "{xref: 'x', yref: 'paper', x: '" + x_val
+        + "', y: 1.04, xanchor: 'left', text: '↓ " + ms_label + " (" + 
str(ms_date) + ")', "
+        + "showarrow: false, font: {size: 11, color: '#666'}}"
+    )
+shapes_js = '[' + ', '.join(ms_shapes) + ']'
+annots_js = '[' + ', '.join(ms_annots) + ']'
+
+
+# Build the optional PR-charts HTML and JS sections.
+if UPSTREAM_REPO:
+    pr_cards_html = (
+        '<div class="card"><div id="c_prc"></div></div>\n'
+        '<div class="card"><div id="c_prm"></div></div>\n'
+        '<div class="card"><div id="c_rel"></div></div>\n'
+    )
+    pr_charts_js = (
+        f"meanChart('c_prc',    'Mean time createdAt → PR opened (days)',  "
+        f"{js_array(prc_ys)}, {js_array(prc_ns)}, 'd', '#16a085');\n"
+        f"meanChart('c_prm',    'Mean time PR-open → PR-merged (days)',    "
+        f"{js_array(prm_ys)}, {js_array(prm_ns)}, 'd', '#2980b9');\n"
+        f"meanChart('c_rel',    'Mean time PR-merged → advisory announced 
(days)', "
+        f"{js_array(rel_ys)}, {js_array(rel_ns)}, 'd', '#d35400');"
+    )
+else:
+    pr_cards_html = ''
+    pr_charts_js = ''
+
+
+HTML = f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<title>tracker {bucket_word}ly statistics</title>
+<script src="https://cdn.plot.ly/plotly-2.35.2.min.js";></script>
+<style>
+body {{ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", 
sans-serif; margin: 0 auto; padding: 16px; color: #222; max-width: 1400px; }}
+.grid {{ display: grid; grid-template-columns: 1fr 1fr; gap: 16px; }}
+.card {{ border: 1px solid #e0e0e0; border-radius: 8px; padding: 8px; 
background: #fafafa; }}
+.card.full {{ grid-column: 1 / -1; }}
+</style>
+</head>
+<body>
+
+<div class="grid">
+
+<div class="card full"><div id="c_states"></div></div>
+<div class="card full"><div id="c_open_vs_untriaged"></div></div>
+<div class="card full"><div id="c_cum"></div></div>
+<div class="card"><div id="c_triage"></div></div>
+<div class="card"><div id="c_resp"></div></div>
+{pr_cards_html}
+</div>
+
+<script>
+const buckets = {js_quotes(bucket_labels)};
+
+function lineOpts() {{ return {{ type: 'scatter', mode: 'lines+markers', 
connectgaps: true }}; }}
+
+// Milestone markers (config-driven).
+const milestoneShapes = {shapes_js};
+const milestoneAnnotations = {annots_js};
+const MILESTONES_LAYOUT = {{shapes: milestoneShapes, annotations: 
milestoneAnnotations}};
+
+// Stacked-line lifecycle bands
+Plotly.newPlot('c_states', [
+{stacked_block}
+], {{
+  ...MILESTONES_LAYOUT,
+  title: 'Issue lifecycle bands (stacked, end-of-{bucket_word} snapshots)',
+  yaxis: {{title: 'tracker count'}},
+  legend: {{orientation: 'h'}},
+  hovermode: 'x unified'
+}});
+
+// Opened-in-bucket vs untriaged-at-bucket-end
+Plotly.newPlot('c_open_vs_untriaged', [
+  {{x: buckets, y: {js_array(opened_in_b)},        name: 'opened in 
{bucket_word}',
+    type: 'scatter', mode: 'lines+markers', connectgaps: true,
+    line: {{color: '#1f77b4'}}}},
+  {{x: buckets, y: {js_array(untriaged_at_bend)},  name: 'untriaged at 
{bucket_word}-end',
+    type: 'scatter', mode: 'lines+markers', connectgaps: true,
+    line: {{color: '#d62728'}}}}
+], {{
+  ...MILESTONES_LAYOUT,
+  title: 'Opened vs. untriaged backlog (per {bucket_word})',
+  yaxis: {{title: 'count'}},
+  legend: {{orientation: 'h'}}
+}});
+
+Plotly.newPlot('c_cum', [
+  {{x: buckets, y: {js_array(cum_opened)}, name: 'cumulative opened',
+    type: 'scatter', mode: 'lines+markers', connectgaps: true,
+    line: {{color: '#1f77b4'}}, fill: 'tozeroy'}},
+  {{x: buckets, y: {js_array(cum_closed)}, name: 'cumulative closed',
+    type: 'scatter', mode: 'lines+markers', connectgaps: true,
+    line: {{color: '#2ca02c'}}, fill: 'tozeroy'}}
+], {{
+  ...MILESTONES_LAYOUT,
+  title: 'Cumulative opened vs. closed (gap = open backlog)',
+  yaxis: {{title: 'count'}},
+  legend: {{orientation: 'h'}}
+}});
+
+function meanChart(divId, title, ys, ns, unit, color) {{
+  Plotly.newPlot(divId, [{{
+    x: buckets, y: ys,
+    type: 'scatter', mode: 'lines+markers', connectgaps: true,
+    text: ns.map(n => 'n=' + n),
+    hovertemplate: '%{{x}}<br>mean: %{{y:.2f}} ' + unit + 
'<br>%{{text}}<extra></extra>',
+    line: {{color: color}}
+  }}], {{
+    ...MILESTONES_LAYOUT,
+    title: title,
+    yaxis: {{title: 'mean ' + unit, rangemode: 'tozero'}}
+  }});
+}}
+
+meanChart('c_triage', 'Mean time to triage (hours)',          
{js_array(triage_ys)}, {js_array(triage_ns)}, 'h', '#c0392b');
+meanChart('c_resp',   'Mean time to first response (hours)',  
{js_array(resp_ys)}, {js_array(resp_ns)}, 'h', '#8e44ad');
+{pr_charts_js}
+</script>
+</body>
+</html>
+"""
+
+with open(OUT_PATH, 'w') as f:
+    f.write(HTML)
+
+print(f"\nWrote {OUT_PATH} ({len(HTML)} bytes)")
diff --git a/tools/security-tracker-stats-dashboard/run.sh 
b/tools/security-tracker-stats-dashboard/run.sh
new file mode 100755
index 0000000..9e7e01a
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/run.sh
@@ -0,0 +1,58 @@
+#!/bin/bash
+# Orchestrator - fetch all data then render the dashboard.
+#
+# Usage: run.sh [output-path]
+#
+# Env overrides:
+#   TRACKER_STATS_CACHE          (default: /tmp/tracker-stats-cache)
+#   TRACKER_STATS_OUT            (default: /tmp/airflow_s_monthly.html - or 
arg $1)
+#   TRACKER_STATS_REPO           tracker repo (default: airflow-s/airflow-s)
+#   TRACKER_STATS_BUCKETS        monthly | quarterly (overlay)
+#   TRACKER_STATS_START          "YYYY-MM" / "YYYY-Qn" (overlay)
+#   TRACKER_STATS_UPSTREAM_REPO  upstream repo slug or "none" (overlay)
+#   TRACKER_STATS_CONFIG         path to a YAML overlay file
+#
+# render.py reads its config from `scripts/default-config.yaml`,
+# optionally overlaid by $TRACKER_STATS_CONFIG and the env-var quick
+# overrides above. See default-config.yaml for the schema.
+
+set -e
+HERE="$(cd "$(dirname "$0")" && pwd)"
+
+if [ -n "$1" ]; then
+  export TRACKER_STATS_OUT="$1"
+fi
+
+# Prefer python with PyYAML if available; render.py falls back to a tiny
+# built-in YAML subset parser when pyyaml is missing. Adopters who use
+# `uv` can opt in to a clean PyYAML invocation by setting
+# TRACKER_STATS_PY=uv-yaml; default is plain python3.
+PY="${TRACKER_STATS_PY:-python3}"
+case "$PY" in
+  uv-yaml)
+    PY_CMD=(uv run --with pyyaml python3)
+    ;;
+  *)
+    PY_CMD=("$PY")
+    ;;
+esac
+
+echo "-> fetch_issues"
+"${PY_CMD[@]}" "$HERE/fetch_issues.py"
+
+echo "-> fetch_roster"
+"${PY_CMD[@]}" "$HERE/fetch_roster.py"
+
+echo "-> fetch_bodies"
+"${PY_CMD[@]}" "$HERE/fetch_bodies.py"
+
+echo "-> fetch_events"
+"${PY_CMD[@]}" "$HERE/fetch_events.py"
+
+echo "-> fetch_prs"
+"${PY_CMD[@]}" "$HERE/fetch_prs.py"
+
+echo "-> render"
+"${PY_CMD[@]}" "$HERE/render.py"
+
+echo "done: ${TRACKER_STATS_OUT:-/tmp/airflow_s_monthly.html}"

(airflow-steward) branch main updated: Add security-tracker-stats-dashboard tool + skill (#248)

Reply via email to