This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new f2104ed Add security-tracker-stats-dashboard tool + skill (#248)
f2104ed is described below
commit f2104ed463f1b22a3547066543738823f9a9bcd8
Author: Jarek Potiuk <[email protected]>
AuthorDate: Fri May 22 20:03:29 2026 +0200
Add security-tracker-stats-dashboard tool + skill (#248)
Generalised from the airflow-s reference dashboard, this adds a
read-only stats dashboard for any apache-steward adopter. Output is a
self-contained HTML page with Plotly charts: lifecycle bands (per
adopter-configurable categories), opened-vs-untriaged backlog,
cumulative opened/closed, mean time to triage, mean time to first
response, and PR-driven mean times (createdAt -> PR-opened, PR-open
-> PR-merged, PR-merged -> advisory announced).
Configuration lives in a YAML overlay the adopter places at
`.apache-steward-overrides/security-tracker-stats.yaml` (path configurable
in
`<project-config>/security-tracker-stats.md`). Defaults reproduce the
airflow-s reference implementation byte-for-byte; everything that
was hardcoded there (bucket granularity, milestones, scope labels,
category predicates, triage keywords, bot prefixes, upstream repo)
is now an overrideable knob.
The skill at `.claude/skills/security-tracker-stats-dashboard/SKILL.md`
follows the framework's standard structural template (placeholder
convention header, adopter-override section, snapshot-drift section,
prerequisites, inputs, how-to-invoke, golden rules, failure modes).
The renderer prefers PyYAML when available and falls back to a tiny
bundled YAML subset parser when it is not, so adopters without a
build step still get the dashboard.
---
.../security-tracker-stats-dashboard/SKILL.md | 267 +++++
projects/_template/security-tracker-stats.md | 139 +++
tools/security-tracker-stats-dashboard/README.md | 181 ++++
.../default-config.yaml | 135 +++
.../fetch_bodies.py | 56 +
.../fetch_events.py | 61 ++
.../fetch_issues.py | 26 +
.../security-tracker-stats-dashboard/fetch_prs.py | 103 ++
.../fetch_roster.py | 24 +
tools/security-tracker-stats-dashboard/render.py | 1121 ++++++++++++++++++++
tools/security-tracker-stats-dashboard/run.sh | 58 +
11 files changed, 2171 insertions(+)
diff --git a/.claude/skills/security-tracker-stats-dashboard/SKILL.md
b/.claude/skills/security-tracker-stats-dashboard/SKILL.md
new file mode 100644
index 0000000..d940c0b
--- /dev/null
+++ b/.claude/skills/security-tracker-stats-dashboard/SKILL.md
@@ -0,0 +1,267 @@
+---
+name: security-tracker-stats-dashboard
+description: |
+ Generate a self-contained HTML dashboard of `<tracker>` repository
+ statistics: issue-lifecycle bands (untriaged / triaged / PR-merged /
+ fixed-released / closed-other), opened-vs-untriaged backlog,
+ cumulative opened/closed, mean time to triage, mean time to first
+ response, and — when `<upstream>` is configured — mean time
+ createdAt -> PR-opened, PR-open -> PR-merged, and PR-merged ->
+ advisory announced. All charts are line / area (no bars) with
+ `connectgaps: true`. Vertical annotations on every chart mark the
+ milestones declared in the project's overlay (e.g. "skill
+ adoption", "team handover", "process change").
+when_to_use: |
+ Invoke when the user says "regenerate the tracker dashboard", "show
+ monthly/quarterly stats", "tracker stats", "dashboard", or
+ variations. Also when an existing dashboard at the configured output
+ path is stale (older than ~24 h) and the user is reviewing tracker
+ health. Read-only — the skill never modifies any tracker state.
+license: Apache-2.0
+---
+
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!-- Placeholder convention (see
AGENTS.md#placeholder-convention-used-in-skill-files):
+ <project-config> -> adopting project's `.apache-steward/` directory
+ <framework> -> framework root (the `.apache-steward/`
+ snapshot in an adopter repo, or `.` in the
+ framework standalone checkout)
+ <tracker> -> value of `tracker_repo:` in
<project-config>/project.md
+ (example: airflow-s/airflow-s)
+ <upstream> -> value of `upstream_repo:` in
<project-config>/project.md
+ (example: apache/airflow); may be null for
+ trackers whose fixes do not land in a
+ single upstream codebase.
+ Before running any bash command below, substitute these with the
+ concrete values from the adopting project's <project-config>/project.md.
-->
+
+# security-tracker-stats-dashboard
+
+Read-only skill that renders a self-contained HTML page summarising
+the state of `<tracker>` over time. The skill wraps the
+[`tools/security-tracker-stats-dashboard/`](../../../tools/security-tracker-stats-dashboard/README.md)
+runtime tool — both the slash-command path (this skill) and the
+script path (`run.sh`) run the same fetch + render pipeline; the
+skill adds invocation niceties (resolving cache paths, surfacing the
+output URL, proposing a stale-cache refresh) but never mutates
+anything.
+
+The skill is **read-only on GitHub** — it does not create or modify
+issues, comments, labels, or PRs. It only fetches data via `gh` and
+renders an HTML file.
+
+---
+
+## Adopter overrides
+
+Before running the default behaviour documented
+below, this skill consults
+[`.apache-steward-overrides/security-tracker-stats-dashboard.md`](../../../docs/setup/agentic-overrides.md)
+in the adopter repo if it exists, and applies any
+agent-readable overrides it finds. See
+[`docs/setup/agentic-overrides.md`](../../../docs/setup/agentic-overrides.md)
+for the contract — what overrides may contain, hard
+rules, the reconciliation flow on framework upgrade,
+upstreaming guidance.
+
+Configuration for the *renderer* (bucket granularity, milestones,
+categories, scope labels, triage keywords, …) lives in a separate
+YAML file the adopter places at
+`.apache-steward-overrides/security-tracker-stats.yaml` (path is
+adopter-configurable via `tracker_stats_config:` in
+[`<project-config>/security-tracker-stats.md`](../../../projects/_template/security-tracker-stats.md)).
+The agentic override file above is reserved for *behavioural*
+overrides of this skill (when to propose a refresh, where to write
+the HTML, etc.); renderer knobs go in the YAML config.
+
+**Hard rule**: agents NEVER modify the snapshot under
+`<adopter-repo>/.apache-steward/`. Local modifications
+go in the override file. Framework changes go via PR
+to `apache/airflow-steward`.
+
+---
+
+## Snapshot drift
+
+Also at the top of every run, this skill compares the
+gitignored `.apache-steward.local.lock` (per-machine
+fetch) against the committed `.apache-steward.lock`
+(the project pin). On mismatch the skill surfaces the
+gap and proposes
+[`/setup-steward upgrade`](../setup-steward/upgrade.md).
+The proposal is non-blocking — the user may defer if
+they want to run with the local snapshot for now. See
+[`docs/setup/install-recipes.md` § Subsequent runs and drift
detection](../../../docs/setup/install-recipes.md#subsequent-runs-and-drift-detection)
+for the full flow.
+
+Drift severity:
+
+- **method or URL differ** -> ✗ full re-install needed.
+- **ref differs** (project bumped tag, or `git-branch`
+ local is behind upstream tip) -> ⚠ sync needed.
+- **`svn-zip` SHA-512 mismatches the committed
+ anchor** -> ✗ security-flagged; investigate before
+ upgrading.
+
+---
+
+## Prerequisites
+
+- `gh` authenticated with read access to `<tracker>` (and to
+ `<upstream>` for PR metadata, when configured).
+- `python3` (3.9+).
+- `jq` (used by `fetch_events.py` via gh's `--jq` flag).
+- Network access to `api.github.com` and (for *viewing* the output
+ HTML) Plotly's CDN.
+- Optional: PyYAML. When missing, the renderer falls back to a
+ bundled minimal YAML subset parser sufficient for
+ `default-config.yaml` and typical overlays.
+
+---
+
+## Inputs
+
+The skill accepts up to three optional arguments:
+
+| Selector | Meaning |
+|---|---|
+| *(no args)* | render with all defaults — monthly buckets, default
categories, the adopter's milestones |
+| `quarterly` / `monthly` | override the bucket granularity |
+| `<output-path>` | write the HTML to a specific path |
+| `clear-cache` | delete the fetch cache before fetching |
+| `since:YYYY-MM` / `since:YYYY-Qn` | override the start bucket |
+
+If the adopter passes nothing, surface the resolved output path and
+cache state up front so they can interrupt before a 5-10 minute
+fetch.
+
+---
+
+## How to invoke
+
+1. **Resolve config.** Read
+
[`<project-config>/security-tracker-stats.md`](../../../projects/_template/security-tracker-stats.md)
+ for the project's per-renderer YAML config path (default:
+ `<adopter-repo>/.apache-steward-overrides/security-tracker-stats.yaml`).
+ Surface to the user *which* config file will be applied and
+ *what bucket granularity* it resolves to. If the YAML file does
+ not exist, fall back silently to the framework's
+ `default-config.yaml`.
+
+2. **Check cache freshness.** Inspect
+ `${TRACKER_STATS_CACHE:-/tmp/tracker-stats-cache}/issues.json`
+ mtime. If older than 24 h, propose a fresh fetch; if missing or
+ the user passed `clear-cache`, do a fresh fetch unconditionally.
+
+3. **Run the orchestrator.** Substitute placeholders and invoke:
+
+ ```bash
+ TRACKER_STATS_REPO=<tracker> \
+ TRACKER_STATS_UPSTREAM_REPO=<upstream> \
+
TRACKER_STATS_CONFIG=<adopter-repo>/.apache-steward-overrides/security-tracker-stats.yaml
\
+ bash <framework>/tools/security-tracker-stats-dashboard/run.sh <output-path>
+ ```
+
+ When the user passed `monthly` / `quarterly` or
+ `since:<start>`, prepend the matching `TRACKER_STATS_BUCKETS=` /
+ `TRACKER_STATS_START=` env vars.
+
+4. **Report the result.** Print the final HTML path and a short
+ summary (total trackers, open count, latest-bucket category
+ breakdown, triage-median, PR-merge-median when configured). The
+ pipeline already echoes most of this to stdout — pass it
+ through verbatim and add the clickable
+ `file://<output-path>` line at the end.
+
+The full pipeline:
+
+1. `fetch_issues.py` — `gh issue list --state all --limit 1000` ->
+ `<cache>/issues.json`.
+2. `fetch_roster.py` — `gh api repos/<tracker>/collaborators` ->
+ `<cache>/roster.txt`.
+3. `fetch_bodies.py` — per-issue `body` +
+ `closedByPullRequestsReferences` -> `<cache>/issue_extra.json`.
+4. `fetch_events.py` — per-issue label-history events ->
+ `<cache>/events/<N>.json`.
+5. `fetch_prs.py` — per-PR `createdAt` / `mergedAt` / `state` from
+ `<upstream>` -> `<cache>/prs.json`. Silent no-op when
+ `TRACKER_STATS_UPSTREAM_REPO` is empty or `none`.
+6. `render.py` — reads cache + config, writes HTML to
+ `$TRACKER_STATS_OUT`.
+
+Each fetch script resumes from cache, so re-running after a partial
+failure (rate limit, transient HTTP error) only re-fetches what is
+missing.
+
+---
+
+## Configuration overview
+
+See
+[`tools/security-tracker-stats-dashboard/default-config.yaml`](../../../tools/security-tracker-stats-dashboard/default-config.yaml)
+for the schema with inline documentation, and
+[`tools/security-tracker-stats-dashboard/README.md`](../../../tools/security-tracker-stats-dashboard/README.md)
+for the load order, predicate keys, and snapshot replay semantics.
+
+The most-overridden knobs by adopters tend to be:
+
+- **`buckets:`** — monthly vs. quarterly. Smaller tracker repos
+ (<50 issues / year) read better at quarterly granularity.
+- **`milestones:`** — vertical annotations marking process
+ changes the dashboard should highlight (skill adoption, team
+ handover, policy update). Set to `[]` to remove them.
+- **`scope_labels:`** — the project's primary "what does this
+ affect" axis. Defaults to `[airflow, providers, chart]`;
+ adopters use whatever scope-label set
+
[`<project-config>/scope-labels.md`](../../../projects/_template/scope-labels.md)
+ declares.
+- **`categories:`** — the lifecycle-band classification rules.
+ Defaults match the airflow-s reference implementation
+ byte-for-byte; adopters with different label conventions
+ (e.g. `triaged` instead of *no `needs triage`*) re-state the
+ whole list.
+- **`triage.keywords:`** / **`triage.bot_prefixes:`** — the
+ time-to-triage signal. Adopters whose security team uses
+ different phrasing in triage-proposal comments override these.
+
+---
+
+## Hard rules
+
+**Golden rule 1 — read only, never write.** The skill must not
+post comments, add labels, close, edit, or otherwise mutate any
+tracker, PR, or upstream resource. If the user asks for stats and
+also wants an action, decline the mutation.
+
+**Golden rule 2 — proposal-before-fetch on stale cache.** Before
+running a fresh full fetch (which costs ~5-10 minutes of `gh` API
+calls), surface the proposal and wait for explicit user
+confirmation. Incremental re-renders against a warm cache (~30
+seconds) can run without a prompt.
+
+**Golden rule 3 — never edit the snapshot.** As with every other
+skill, agentic overrides go in
+`.apache-steward-overrides/security-tracker-stats-dashboard.md`; renderer
+overrides go in the project's tracker-stats YAML config file. The
+gitignored snapshot under `.apache-steward/` is never modified.
+
+**Golden rule 4 — surface the config path on every run.** The
+dashboard's output depends entirely on which YAML file the renderer
+loaded. Print the resolved config path (or "default") as the first
+line of skill output so the user can tell at a glance whether their
+overlay is being picked up.
+
+---
+
+## Failure modes
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| `events/<N>.json` missing for some N | gh transient failure during paginate
| Re-run; `fetch_events.py` resumes from cache |
+| `prs.json` has `{"error": ...}` entries | False-positive body parse (PR#
doesn't exist) | Silently filtered at render; safe to ignore |
+| `c_rel` median jumps after re-fetch | New advisory shipped since last run |
Expected — re-render is correct |
+| Empty `c_prc` / `c_prm` / `c_rel` early buckets | No linked PR in those
tracker buckets | Expected — not all early trackers had a fix PR |
+| Three PR charts missing entirely | `upstream_repo: null` in config (or env
override) | By design — set `upstream_repo:` if you want them |
+| `ModuleNotFoundError: yaml` | PyYAML missing | Bundled fallback parser
handles `default-config.yaml`; install pyyaml for richer overlays |
diff --git a/projects/_template/security-tracker-stats.md
b/projects/_template/security-tracker-stats.md
new file mode 100644
index 0000000..ab7619a
--- /dev/null
+++ b/projects/_template/security-tracker-stats.md
@@ -0,0 +1,139 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents** *generated with
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [security-tracker-stats.md (template)](#security-tracker-statsmd-template)
+ - [YAML config path](#yaml-config-path)
+ - [Default output path](#default-output-path)
+ - [Cache directory](#cache-directory)
+ - [Refresh cadence](#refresh-cadence)
+ - [Example overlay
(`security-tracker-stats.yaml`)](#example-overlay-security-tracker-statsyaml)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# security-tracker-stats.md (template)
+
+Per-project configuration consumed by the
+[`security-tracker-stats-dashboard`](../../.claude/skills/security-tracker-stats-dashboard/SKILL.md)
+skill. Copy this file into your project's `<project-config>/`
+directory and edit the values below. Everything is optional — the
+skill falls back to
+[`tools/security-tracker-stats-dashboard/default-config.yaml`](../../tools/security-tracker-stats-dashboard/default-config.yaml)
+when a key is unset.
+
+## YAML config path
+
+```yaml
+tracker_stats_config: .apache-steward-overrides/security-tracker-stats.yaml
+```
+
+The renderer reads its configuration from the YAML file pointed at by
+the `TRACKER_STATS_CONFIG` env var. The skill resolves this from
+`tracker_stats_config:` above (interpreting it relative to the
+adopter repo root). Adopters who want the framework's defaults
+verbatim can leave this unset; the skill will skip the overlay step.
+
+The YAML schema is documented inline at
+[`tools/security-tracker-stats-dashboard/default-config.yaml`](../../tools/security-tracker-stats-dashboard/default-config.yaml).
+
+## Default output path
+
+```yaml
+tracker_stats_output: tmp/tracker_stats.html
+```
+
+The skill writes the rendered HTML to this path (relative to the
+adopter repo root, or absolute) when the user does not pass an
+explicit `<output-path>` argument. The
+`airflow-s/airflow-s` adopter uses `tmp/airflow_s_monthly.html`
+(committed into `tmp/` as the canonical artefact for security-team
+review).
+
+## Cache directory
+
+```yaml
+tracker_stats_cache: /tmp/tracker-stats-cache
+```
+
+Where the fetch scripts persist their cache. Safe to delete (forces a
+full re-fetch). The skill resolves this to the `TRACKER_STATS_CACHE`
+env var.
+
+## Refresh cadence
+
+```yaml
+tracker_stats_refresh_hours: 24
+```
+
+The skill considers the cache stale when `issues.json` is older than
+this many hours, and proposes a refresh before re-rendering. Lower
+this for fast-moving trackers; raise it for trackers where the
+dashboard is reviewed weekly or monthly.
+
+## Example overlay (`security-tracker-stats.yaml`)
+
+A minimal overlay that swaps to quarterly buckets and adds a
+project-specific milestone:
+
+```yaml
+buckets: quarterly
+
+milestones:
+ - date: 2026-04-20
+ label: skill adoption
+ - date: 2026-09-01
+ label: handover to PMC sec team
+```
+
+A bigger overlay that renames the scope labels for a non-Airflow
+adopter and removes the upstream-PR charts entirely (because fixes
+land in many repos, not a single `<upstream>`):
+
+```yaml
+upstream_repo: null
+
+scope_labels: [core, plugins, docs]
+
+milestones: []
+
+# Re-state the full categories list to align with the project's
+# label conventions. The framework's default categories assume
+# `needs triage`, `pr merged`, `fix released`, `announced - emails
+# sent`, `cve allocated` — projects with different label vocabularies
+# need to re-state predicates explicitly.
+categories:
+ - name: fixed_released
+ color: "#2ca02c"
+ predicate:
+ any_of:
+ - any_label: [released]
+ - all_of:
+ state: closed
+ state_reason: COMPLETED
+ any_label: [security-fix]
+ - name: closed_other
+ color: "#888888"
+ predicate:
+ state: closed
+ - name: open_untriaged
+ color: "#d62728"
+ predicate:
+ all_of:
+ state: open
+ any_of:
+ - any_label: [needs triage]
+ - no_scope_label: true
+ - name: open_pr_merged
+ color: "#e67e22"
+ predicate:
+ all_of:
+ state: open
+ any_label: [pr merged]
+ - name: open_triaged
+ color: "#f1c40f"
+ predicate:
+ state: open
+```
diff --git a/tools/security-tracker-stats-dashboard/README.md
b/tools/security-tracker-stats-dashboard/README.md
new file mode 100644
index 0000000..06dcced
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/README.md
@@ -0,0 +1,181 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents** *generated with
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [security-tracker-stats-dashboard](#security-tracker-stats-dashboard)
+ - [Layout](#layout)
+ - [Invocation](#invocation)
+ - [Resume behaviour](#resume-behaviour)
+ - [Configuration](#configuration)
+ - [Categories (lifecycle bands)](#categories-lifecycle-bands)
+ - [Time-to-triage signal](#time-to-triage-signal)
+ - [Milestones (vertical annotations)](#milestones-vertical-annotations)
+ - [When `upstream_repo` is null](#when-upstream_repo-is-null)
+ - [Prerequisites](#prerequisites)
+ - [Failure modes](#failure-modes)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# security-tracker-stats-dashboard
+
+Generate a self-contained HTML dashboard of `<tracker>` repository
+statistics — issue-lifecycle bands (untriaged / triaged / PR-merged /
+fixed-released / closed-other), opened-vs-untriaged backlog, cumulative
+opened/closed, and mean-time-to-triage / first-response / PR-open /
+PR-merge / advisory-announced.
+
+All charts are line / area (no bars) with `connectgaps: true`. Plotly
+loaded via CDN — the output HTML is self-contained but viewing it
+requires network access for the chart library.
+
+The tool is **read-only on GitHub** — it does not create or modify
+issues, comments, labels, or PRs. It only fetches data via `gh` and
+renders an HTML file.
+
+The companion agentic skill at
+[`.claude/skills/security-tracker-stats-dashboard/SKILL.md`](../../.claude/skills/security-tracker-stats-dashboard/SKILL.md)
+wraps this tool and surfaces it through Claude Code's slash-command
+interface; both routes (script-only and skill-driven) run the same
+fetch + render pipeline.
+
+## Layout
+
+```text
+tools/security-tracker-stats-dashboard/
+├── README.md (this file)
+├── default-config.yaml (config schema + adopter-overridable defaults)
+├── render.py (renders cached data to HTML; reads config)
+├── fetch_issues.py (gh issue list -> issues.json)
+├── fetch_roster.py (gh api collaborators -> roster.txt)
+├── fetch_bodies.py (per-issue body + closedByPRs -> issue_extra.json)
+├── fetch_events.py (per-issue label history -> events/<N>.json)
+├── fetch_prs.py (per-PR metadata from <upstream> -> prs.json)
+└── run.sh (orchestrator)
+```
+
+## Invocation
+
+```bash
+bash <framework>/tools/security-tracker-stats-dashboard/run.sh [<output-path>]
+```
+
+Env knobs (all optional):
+
+| Var | Default | Notes |
+|---|---|---|
+| `TRACKER_STATS_REPO` | *(e.g. `airflow-s/airflow-s`)* | `<tracker>` repo
slug |
+| `TRACKER_STATS_OUT` | `/tmp/airflow_s_monthly.html` | output HTML path |
+| `TRACKER_STATS_CACHE` | `/tmp/tracker-stats-cache` | fetch cache dir |
+| `TRACKER_STATS_CONFIG` | *(unset)* | path to a YAML overlay file |
+| `TRACKER_STATS_BUCKETS` | *(from config: `monthly`)* | `monthly` or
`quarterly` |
+| `TRACKER_STATS_START` | *(from config: `null`)* | `YYYY-MM` or `YYYY-Qn` |
+| `TRACKER_STATS_UPSTREAM_REPO` | *(from config; e.g. `apache/airflow`)* |
`<upstream>` repo slug; `none` skips PR charts |
+
+### Resume behaviour
+
+Each fetch script resumes from cache, so re-running after a partial
+failure (rate limit, transient HTTP error) only re-fetches what is
+missing. Delete the cache dir to force a fresh full fetch.
+
+Fetches are parallelised (`ThreadPoolExecutor`, ~10 workers). A fresh
+run is ~5–10 minutes on a 250-issue tracker; incremental re-renders
+(cache warm) are ~30 seconds.
+
+## Configuration
+
+`render.py` loads configuration in this order, highest priority last:
+
+1. `default-config.yaml` (in this directory).
+2. `$TRACKER_STATS_CONFIG` overlay YAML, when set (typically
+ `<adopter-repo>/.apache-steward-overrides/security-tracker-stats.yaml`).
+ Deep-merged with the default. **The `milestones` and `categories`
+ lists are REPLACED entirely** (not concatenated) — overlaying a
+ single category requires re-stating the whole list.
+3. Env-var quick overrides for the most common knobs:
+ `TRACKER_STATS_BUCKETS`, `TRACKER_STATS_START`,
+ `TRACKER_STATS_UPSTREAM_REPO`.
+
+See [`default-config.yaml`](default-config.yaml) for the full schema
+with inline documentation of every predicate key.
+
+### Categories (lifecycle bands)
+
+Mutually-exclusive states per tracker at each bucket-end snapshot,
+evaluated **top-to-bottom** with first-match-wins. Multiple rules can
+share a `name` to express disjoint branches of the same final
+category — the default set uses this for the `open / closed`
+fork on `fixed_released`. The set of distinct names defines the
+stack order in the lifecycle chart (overridable via the
+`stack_order:` config key).
+
+Supported predicate keys:
+
+| Key | Meaning |
+|---|---|
+| `state` | `open` / `closed` |
+| `state_reason` | `COMPLETED` / `NOT_PLANNED` / `REOPENED` / `null` |
+| `any_label` | at least one of the listed labels is present |
+| `all_labels` | every label in the list is present |
+| `not_label` | the named label must NOT be present |
+| `not_any_label` | none of the listed labels present |
+| `no_scope_label` (`true`/`false`) | tracker carries none of `scope_labels` |
+| `has_scope_label` (`true`/`false`) | tracker carries at least one of
`scope_labels` |
+| `pr_merged_by_snapshot` (`true`/`false`) | a linked `<upstream>` PR is
merged by the snapshot timestamp |
+| `any_of` / `all_of` | logical combinators (nestable) |
+
+Snapshot reconstruction replays each tracker's event stream
+(labeled / unlabeled / closed / reopened) chronologically from
+`{labels: [], state: OPEN}` at `createdAt`, evaluated at the
+bucket-end timestamp (Mar 31 / Jun 30 / Sep 30 / Dec 31 at 23:59:59 UTC
+for quarterly; calendar-month last day for monthly).
+
+### Time-to-triage signal
+
+First tracker comment whose author is on the roster (from
+`fetch_roster.py`) AND whose body matches any
+`triage.keywords[]` regex (case-insensitive). Falls back to
+the **first non-bot roster comment** when no keyword matches
+(useful for older trackers that predate the team's triage-comment
+convention). The `triage.bot_prefixes[]` list skips automated
+rollup / sync / import comments.
+
+### Milestones (vertical annotations)
+
+`milestones[]` produces a vertical dashed line + top-label annotation
+on every time-axis chart. Each entry needs `date: YYYY-MM-DD` (mapped
+onto the bucket axis) and `label`. Set `milestones: []` in an overlay
+to remove them entirely.
+
+### When `upstream_repo` is null
+
+The `c_prc` / `c_prm` / `c_rel` PR-driven mean-time charts are
+omitted, the `fetch_prs.py` stage is a silent no-op, and the
+`pr_merged_by_snapshot` predicate is always false (so the
+`open_pr_merged` snapshot back-fill rule is disabled). The
+remaining charts still render.
+
+## Prerequisites
+
+- `gh` authenticated with read access to `<tracker>` (and to
+ `<upstream>` for PR metadata, when configured).
+- `python3` (3.9+).
+- `jq` (only used by the fetch scripts via gh's `--jq` flag).
+- Network access to `api.github.com` and (for viewing) Plotly's CDN.
+- Optional: `pyyaml`. When missing, `render.py` falls back to a
+ bundled minimal YAML subset parser sufficient for
+ `default-config.yaml` and typical overlays. To pin a clean PyYAML
+ invocation, set `TRACKER_STATS_PY=uv-yaml` and the orchestrator
+ runs every step under `uv run --with pyyaml`.
+
+## Failure modes
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| `events/<N>.json` missing for some N | gh transient failure during paginate
| Re-run `run.sh`; `fetch_events.py` resumes from cache |
+| `prs.json` has `{"error": ...}` entries | False-positive body parse (PR#
doesn't exist) | Silently filtered at render; safe to ignore |
+| `c_rel` median jumps after re-fetch | New advisory shipped since last run |
Expected — re-render is correct |
+| Empty `c_prc` / `c_prm` / `c_rel` early buckets | No linked PR in those
tracker buckets | Expected — not all early trackers had a fix PR |
+| `ModuleNotFoundError: yaml` | PyYAML missing | The bundled fallback parser
handles `default-config.yaml`; for richer overlays install pyyaml or use
`TRACKER_STATS_PY=uv-yaml` |
diff --git a/tools/security-tracker-stats-dashboard/default-config.yaml
b/tools/security-tracker-stats-dashboard/default-config.yaml
new file mode 100644
index 0000000..12edaad
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/default-config.yaml
@@ -0,0 +1,135 @@
+# Default configuration for security-tracker-stats-dashboard.
+#
+# All knobs here can be overridden either by a YAML file pointed at by
+# the `TRACKER_STATS_CONFIG` env var (deep-merged with this default; the
+# `milestones` and `categories` lists are REPLACED entirely, not
+# concatenated), or by env-var quick overrides for the most common knobs
+# (`TRACKER_STATS_BUCKETS`, `TRACKER_STATS_START`,
`TRACKER_STATS_UPSTREAM_REPO`).
+#
+# Defaults below match the reference `airflow-s/airflow-s` dashboard
+# byte-for-byte.
+
+buckets: monthly # monthly | quarterly
+start: null # null = first tracker createdAt; else
"YYYY-MM" (monthly) or "YYYY-Qn" (quarterly)
+upstream_repo: apache/airflow # null -> skip c_prc/c_prm/c_rel charts and
the back-fill rule
+
+milestones:
+ - date: 2026-04-20
+ label: skill adoption
+
+scope_labels: [airflow, providers, chart]
+
+# Categories - evaluated top-to-bottom, FIRST MATCH WINS. Multiple rules
+# can share the same `name` (and `color`) to express disjoint branches
+# of the same final category. The set of distinct names defines the
+# stacked-band order in the dashboard's lifecycle chart (preserved in
+# the order they FIRST appear in this list).
+#
+# Each predicate is conjunctive: ALL conditions must match. Supported keys:
+# state open | closed
+# state_reason COMPLETED | NOT_PLANNED | REOPENED | null
+# any_label list - at least one of these labels present
+# all_labels list - every label in this list present
+# not_label single label - must NOT be present
+# not_any_label list - none of these labels present
+# no_scope_label true - tracker has none of the scope_labels
+# has_scope_label true - tracker has at least one scope_label
+# pr_merged_by_snapshot true - a linked upstream PR is merged at
snapshot time
+# Logical combinators: any_of / all_of (nest as deep as you need).
+categories:
+ # --- Closed branch (mirrors `if not is_open:` in the reference). ----
+ - name: fixed_released
+ color: "#2ca02c"
+ predicate:
+ all_of:
+ state: closed
+ any_of:
+ - any_label: [fix released, "announced - emails sent", announced]
+ - all_of:
+ state_reason: COMPLETED
+ any_label: [cve allocated]
+ - name: closed_other
+ color: "#888888"
+ predicate:
+ state: closed
+
+ # --- Open branch (mirrors the reference's open-branch order). -------
+ - name: open_untriaged
+ color: "#d62728"
+ predicate:
+ all_of:
+ state: open
+ any_of:
+ - any_label: [needs triage]
+ - no_scope_label: true
+ # PR-merge-by-snapshot back-fill: an upstream PR has merged by the
+ # snapshot timestamp. Captures historical trackers that predate the
+ # `pr merged` label convention.
+ - name: open_pr_merged
+ color: "#e67e22"
+ predicate:
+ all_of:
+ state: open
+ pr_merged_by_snapshot: true
+ not_label: fix released
+ - name: fixed_released
+ color: "#2ca02c"
+ predicate:
+ all_of:
+ state: open
+ pr_merged_by_snapshot: true
+ any_label: [fix released]
+ - name: open_pr_merged
+ color: "#e67e22"
+ predicate:
+ all_of:
+ state: open
+ any_label: [pr merged]
+ not_label: fix released
+ - name: fixed_released
+ color: "#2ca02c"
+ predicate:
+ all_of:
+ state: open
+ any_label: [fix released, "announced - emails sent", announced]
+ - name: open_triaged
+ color: "#f1c40f"
+ predicate:
+ state: open
+
+# The order in which distinct category names FIRST appear above is the
+# stacked-band order top-to-bottom in the lifecycle chart. For the
+# defaults that resolves to: fixed_released, closed_other,
+# open_untriaged, open_pr_merged, open_triaged. The reference dashboard
+# uses a different stack order, however, so we re-pin it here:
+stack_order:
+ - fixed_released
+ - open_pr_merged
+ - open_triaged
+ - open_untriaged
+ - closed_other
+
+triage:
+ keywords:
+ - triage proposal
+ - proposed disposition
+ - VALID
+ - INVALID
+ - DEFENSE-IN-DEPTH
+ - INFO-ONLY
+ - PROBABLE-DUP
+ - looks like a valid security issue
+ - not a security issue
+ - not a vulnerability
+ - out of scope
+ - out-of-scope
+ - agreed
+ - Security Model
+ - cve-worthy
+ - CVE-worthy
+ bot_prefixes:
+ - "<!-- airflow-s status rollup v"
+ - "**Sync "
+ - "**Imported on "
+ - "**Status update"
+ - "**Allocated CVE"
diff --git a/tools/security-tracker-stats-dashboard/fetch_bodies.py
b/tools/security-tracker-stats-dashboard/fetch_bodies.py
new file mode 100644
index 0000000..dccfd6b
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_bodies.py
@@ -0,0 +1,56 @@
+#!/usr/bin/env python3
+"""Fetch issue body + closedByPullRequestsReferences for every tracker
+issue and cache to /tmp/claude/dashboard/issue_extra.json."""
+
+import json
+import os
+import subprocess
+from concurrent.futures import ThreadPoolExecutor, as_completed
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+REPO = os.environ.get('TRACKER_STATS_REPO', 'airflow-s/airflow-s')
+OUT = f'{ROOT}/issue_extra.json'
+
+with open(f'{ROOT}/issues.json') as f:
+ issues = json.load(f)
+
+# Resume support
+cache = {}
+if os.path.exists(OUT):
+ with open(OUT) as f:
+ cache = json.load(f)
+ print(f"resume: {len(cache)} cached")
+
+todo = [i['number'] for i in issues if str(i['number']) not in cache]
+print(f"to fetch: {len(todo)}")
+
+
+def fetch(n):
+ try:
+ r = subprocess.run(
+ ['gh', 'issue', 'view', str(n), '--repo', REPO,
+ '--json', 'number,body,closedByPullRequestsReferences'],
+ capture_output=True, text=True, timeout=60,
+ )
+ if r.returncode != 0:
+ return n, {'error': r.stderr.strip()}
+ return n, json.loads(r.stdout)
+ except Exception as e:
+ return n, {'error': str(e)}
+
+
+done = 0
+with ThreadPoolExecutor(max_workers=10) as ex:
+ futs = {ex.submit(fetch, n): n for n in todo}
+ for fut in as_completed(futs):
+ n, data = fut.result()
+ cache[str(n)] = data
+ done += 1
+ if done % 25 == 0:
+ with open(OUT, 'w') as f:
+ json.dump(cache, f)
+ print(f" {done}/{len(todo)}")
+
+with open(OUT, 'w') as f:
+ json.dump(cache, f)
+print(f"done: cached {len(cache)} → {OUT}")
diff --git a/tools/security-tracker-stats-dashboard/fetch_events.py
b/tools/security-tracker-stats-dashboard/fetch_events.py
new file mode 100644
index 0000000..a496b67
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_events.py
@@ -0,0 +1,61 @@
+#!/usr/bin/env python3
+"""Fetch per-issue label-history events. Resumes from cache."""
+
+import json
+import subprocess
+import concurrent.futures
+import os
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+REPO = os.environ.get('TRACKER_STATS_REPO', 'airflow-s/airflow-s')
+EVENTS_DIR = f'{ROOT}/events'
+
+with open(f'{ROOT}/issues.json') as f:
+ issues = json.load(f)
+
+numbers = [i['number'] for i in issues]
+print(f"Fetching events for {len(numbers)} issues...")
+
+os.makedirs(EVENTS_DIR, exist_ok=True)
+
+def fetch_one(n):
+ out_path = f'{EVENTS_DIR}/{n}.json'
+ if os.path.exists(out_path) and os.path.getsize(out_path) > 0:
+ return (n, True, 'cached')
+ try:
+ r = subprocess.run(
+ ['gh', 'api', f'repos/{REPO}/issues/{n}/events',
+ '--paginate',
+ '--jq', '[.[] | select(.event == "labeled" or .event ==
"unlabeled" or .event == "closed" or .event == "reopened") | {event, label:
(.label.name // null), created_at}]'],
+ capture_output=True, text=True, timeout=60
+ )
+ if r.returncode != 0:
+ return (n, False, r.stderr[:200])
+ out = r.stdout.strip()
+ decoder = json.JSONDecoder()
+ idx = 0
+ merged = []
+ while idx < len(out):
+ while idx < len(out) and out[idx] in ' \n\r\t':
+ idx += 1
+ if idx >= len(out):
+ break
+ obj, n2 = decoder.raw_decode(out, idx)
+ merged.extend(obj)
+ idx = n2
+ with open(out_path, 'w') as f:
+ json.dump(merged, f)
+ return (n, True, f'{len(merged)} events')
+ except Exception as e:
+ return (n, False, str(e)[:200])
+
+with concurrent.futures.ThreadPoolExecutor(max_workers=10) as ex:
+ results = list(ex.map(fetch_one, numbers))
+
+ok = sum(1 for _, ok, _ in results if ok)
+fail = [(n, msg) for n, ok, msg in results if not ok]
+print(f"Done: {ok}/{len(numbers)} OK")
+if fail:
+ print("FAILURES:")
+ for n, msg in fail[:20]:
+ print(f" #{n}: {msg}")
diff --git a/tools/security-tracker-stats-dashboard/fetch_issues.py
b/tools/security-tracker-stats-dashboard/fetch_issues.py
new file mode 100644
index 0000000..e1587ce
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_issues.py
@@ -0,0 +1,26 @@
+#!/usr/bin/env python3
+"""Dump all tracker issues (state=all, no PRs) to <cache>/issues.json."""
+
+import json
+import os
+import subprocess
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+REPO = os.environ.get('TRACKER_STATS_REPO', 'airflow-s/airflow-s')
+
+os.makedirs(ROOT, exist_ok=True)
+
+print(f"Fetching issue list from {REPO} (state=all, limit 1000) ...")
+r = subprocess.run(
+ ['gh', 'issue', 'list', '--repo', REPO, '--state', 'all', '--limit',
'1000',
+ '--json',
'number,title,state,stateReason,createdAt,closedAt,labels,comments'],
+ capture_output=True, text=True, timeout=300,
+)
+if r.returncode != 0:
+ raise SystemExit(f"gh failed: {r.stderr}")
+
+issues = json.loads(r.stdout)
+with open(f'{ROOT}/issues.json', 'w') as f:
+ json.dump(issues, f)
+
+print(f"Wrote {len(issues)} issues to {ROOT}/issues.json")
diff --git a/tools/security-tracker-stats-dashboard/fetch_prs.py
b/tools/security-tracker-stats-dashboard/fetch_prs.py
new file mode 100644
index 0000000..8ed7ca9
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_prs.py
@@ -0,0 +1,103 @@
+#!/usr/bin/env python3
+"""Fetch createdAt + mergedAt + state for every upstream-repo PR referenced
+by any tracker (via closedByPullRequestsReferences or body parse). Cache to
+`<TRACKER_STATS_CACHE>/prs.json`.
+
+The upstream repo is `$TRACKER_STATS_UPSTREAM_REPO` (default
+`apache/airflow`); set to `none` / `""` to skip this fetch entirely."""
+
+import json
+import os
+import re
+import subprocess
+from concurrent.futures import ThreadPoolExecutor, as_completed
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+UPSTREAM = os.environ.get('TRACKER_STATS_UPSTREAM_REPO', 'apache/airflow')
+if UPSTREAM in ('', 'none', 'null'):
+ print('TRACKER_STATS_UPSTREAM_REPO is empty/none - skipping PR fetch.')
+ raise SystemExit(0)
+
+EXTRA = f'{ROOT}/issue_extra.json'
+OUT = f'{ROOT}/prs.json'
+
+with open(EXTRA) as f:
+ extra = json.load(f)
+
+PR_PAT = re.compile(
+
rf'{re.escape(UPSTREAM)}#(\d+)|https://github\.com/{re.escape(UPSTREAM)}/pull/(\d+)',
+ re.I,
+)
+
+
+def extract_prs(v):
+ nums = set()
+ cb = v.get('closedByPullRequestsReferences') or []
+ for ref in cb:
+ if ref.get('repository', {}).get('nameWithOwner') == UPSTREAM:
+ nums.add(ref['number'])
+ body = v.get('body') or ''
+ # Only parse the "PR with the fix" field portion if we can find it,
+ # but also accept apache/airflow PR mentions anywhere in the body
+ # (the spec allows either).
+ for m in PR_PAT.findall(body):
+ n = m[0] or m[1]
+ if n:
+ nums.add(int(n))
+ return nums
+
+
+# Build issue -> PR set + collect all unique PRs
+issue_to_prs = {}
+all_prs = set()
+for issue_n, v in extra.items():
+ prs = extract_prs(v)
+ issue_to_prs[issue_n] = sorted(prs)
+ all_prs.update(prs)
+
+# Save the issue_to_prs linkage map alongside
+with open(f'{ROOT}/issue_to_prs.json', 'w') as f:
+ json.dump(issue_to_prs, f)
+print(f"unique {UPSTREAM} PRs to fetch: {len(all_prs)}")
+
+# Resume support
+cache = {}
+if os.path.exists(OUT):
+ with open(OUT) as f:
+ cache = json.load(f)
+ print(f"resume: {len(cache)} cached")
+
+todo = [n for n in all_prs if str(n) not in cache]
+print(f"to fetch: {len(todo)}")
+
+
+def fetch(n):
+ try:
+ r = subprocess.run(
+ ['gh', 'pr', 'view', str(n), '--repo', UPSTREAM,
+ '--json', 'number,createdAt,mergedAt,state'],
+ capture_output=True, text=True, timeout=60,
+ )
+ if r.returncode != 0:
+ return n, {'error': r.stderr.strip()}
+ return n, json.loads(r.stdout)
+ except Exception as e:
+ return n, {'error': str(e)}
+
+
+done = 0
+with ThreadPoolExecutor(max_workers=12) as ex:
+ futs = {ex.submit(fetch, n): n for n in todo}
+ for fut in as_completed(futs):
+ n, data = fut.result()
+ cache[str(n)] = data
+ done += 1
+ if done % 25 == 0:
+ with open(OUT, 'w') as f:
+ json.dump(cache, f)
+ print(f" {done}/{len(todo)}")
+
+with open(OUT, 'w') as f:
+ json.dump(cache, f)
+errs = sum(1 for v in cache.values() if 'error' in v)
+print(f"done: cached {len(cache)} PRs ({errs} errors) → {OUT}")
diff --git a/tools/security-tracker-stats-dashboard/fetch_roster.py
b/tools/security-tracker-stats-dashboard/fetch_roster.py
new file mode 100644
index 0000000..2ef0f66
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/fetch_roster.py
@@ -0,0 +1,24 @@
+#!/usr/bin/env python3
+"""Dump the security-team roster (tracker repo's collaborators) to
<cache>/roster.txt."""
+
+import os
+import subprocess
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+REPO = os.environ.get('TRACKER_STATS_REPO', 'airflow-s/airflow-s')
+
+os.makedirs(ROOT, exist_ok=True)
+
+r = subprocess.run(
+ ['gh', 'api', f'repos/{REPO}/collaborators', '--jq', '.[].login',
'--paginate'],
+ capture_output=True, text=True, timeout=60,
+)
+if r.returncode != 0:
+ raise SystemExit(f"gh failed: {r.stderr}")
+
+logins = [ln.strip() for ln in r.stdout.splitlines() if ln.strip()]
+with open(f'{ROOT}/roster.txt', 'w') as f:
+ for ln in sorted(set(logins)):
+ f.write(ln + '\n')
+
+print(f"Wrote {len(set(logins))} roster handles to {ROOT}/roster.txt")
diff --git a/tools/security-tracker-stats-dashboard/render.py
b/tools/security-tracker-stats-dashboard/render.py
new file mode 100644
index 0000000..ccf971f
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/render.py
@@ -0,0 +1,1121 @@
+#!/usr/bin/env python3
+"""
+Regenerate a tracker-stats dashboard. Reads cached issues+events+PR data
+from `$TRACKER_STATS_CACHE` (default `/tmp/tracker-stats-cache`) and writes
+a self-contained HTML page to `$TRACKER_STATS_OUT`.
+
+Configuration is loaded from `scripts/default-config.yaml`, optionally
+overlaid by a YAML file at `$TRACKER_STATS_CONFIG` (deep-merged; the
+`milestones` and `categories` lists are REPLACED entirely, not
+concatenated), then overlaid by these env-var quick overrides:
+
+ TRACKER_STATS_BUCKETS monthly | quarterly
+ TRACKER_STATS_START "YYYY-MM" (monthly) or "YYYY-Qn" (quarterly)
+ TRACKER_STATS_UPSTREAM_REPO upstream repo slug (or "" / "none" to skip
PR charts)
+ TRACKER_STATS_REPO tracker repo slug (operational)
+ TRACKER_STATS_OUT output path
+ TRACKER_STATS_CACHE cache dir
+ TRACKER_STATS_CONFIG path to a YAML overlay file
+
+Defaults match the reference `airflow-s/airflow-s` dashboard byte-for-byte.
+
+Mean-time charts (createdAt -> PR opened, PR opened -> PR merged, PR merged
+-> advisory announced) use real PR timestamps from the configured upstream
+repo, not the `pr created` / `pr merged` label-add events (which were only
+adopted in late 2025 and erased pre-2026 history). When `upstream_repo` is
+null, those three charts are omitted and the snapshot back-fill rule is
+disabled.
+"""
+
+import calendar
+import json
+import os
+import re
+import statistics
+import datetime as dt
+from collections import defaultdict
+
+# --- YAML loader ----------------------------------------------------
+# Prefer pyyaml when available (handles every edge case). When it's not
+# installed, fall back to a tiny subset parser that covers the schema in
+# default-config.yaml only.
+try:
+ import yaml # type: ignore
+
+ def yaml_load(text):
+ return yaml.safe_load(text)
+
+except ImportError:
+ def yaml_load(text):
+ return _minimal_yaml_load(text)
+
+
+def _minimal_yaml_load(text):
+ """Tiny YAML subset parser sufficient for default-config.yaml.
+
+ Supports: nested block mappings, block sequences (`- ...`), inline
+ flow lists `[a, b, "c d"]`, string scalars (with optional quotes),
+ integers, floats, booleans, null. Comments start at `#` outside of
+ quoted strings. No anchors, no merge keys, no flow mappings.
+ """
+ lines = []
+ for raw in text.splitlines():
+ # Strip comments outside of quotes.
+ in_q = None
+ out = []
+ i = 0
+ while i < len(raw):
+ ch = raw[i]
+ if in_q:
+ out.append(ch)
+ if ch == '\\' and i + 1 < len(raw):
+ out.append(raw[i + 1])
+ i += 2
+ continue
+ if ch == in_q:
+ in_q = None
+ i += 1
+ continue
+ if ch in ('"', "'"):
+ in_q = ch
+ out.append(ch)
+ i += 1
+ continue
+ if ch == '#':
+ break
+ out.append(ch)
+ i += 1
+ line = ''.join(out).rstrip()
+ if line.strip():
+ lines.append(line)
+
+ # Parse using indentation stack.
+ def indent_of(s):
+ return len(s) - len(s.lstrip(' '))
+
+ def scalar(s):
+ s = s.strip()
+ if not s:
+ return None
+ if s.lower() in ('null', '~'):
+ return None
+ if s.lower() == 'true':
+ return True
+ if s.lower() == 'false':
+ return False
+ if s.startswith('"') and s.endswith('"') and len(s) >= 2:
+ return s[1:-1].encode().decode('unicode_escape')
+ if s.startswith("'") and s.endswith("'") and len(s) >= 2:
+ return s[1:-1]
+ if s.startswith('[') and s.endswith(']'):
+ inner = s[1:-1].strip()
+ if not inner:
+ return []
+ return [scalar(x) for x in _split_flow_list(inner)]
+ try:
+ if '.' in s or 'e' in s or 'E' in s:
+ return float(s)
+ return int(s)
+ except ValueError:
+ return s
+
+ def _split_flow_list(inner):
+ parts = []
+ cur = []
+ in_q = None
+ depth = 0
+ for ch in inner:
+ if in_q:
+ cur.append(ch)
+ if ch == in_q:
+ in_q = None
+ continue
+ if ch in ('"', "'"):
+ in_q = ch
+ cur.append(ch)
+ continue
+ if ch == '[':
+ depth += 1
+ cur.append(ch)
+ continue
+ if ch == ']':
+ depth -= 1
+ cur.append(ch)
+ continue
+ if ch == ',' and depth == 0:
+ parts.append(''.join(cur).strip())
+ cur = []
+ continue
+ cur.append(ch)
+ if cur:
+ parts.append(''.join(cur).strip())
+ return parts
+
+ def parse_block(idx, base_indent):
+ # Returns (value, next_idx). Inspects the first non-empty line
+ # at >= base_indent to decide mapping vs. sequence.
+ if idx >= len(lines):
+ return None, idx
+ first = lines[idx]
+ ind = indent_of(first)
+ if ind < base_indent:
+ return None, idx
+ if first.lstrip().startswith('- '):
+ return parse_seq(idx, ind)
+ return parse_map(idx, ind)
+
+ def parse_map(idx, base_indent):
+ out = {}
+ while idx < len(lines):
+ line = lines[idx]
+ ind = indent_of(line)
+ if ind < base_indent:
+ break
+ if ind > base_indent:
+ # Shouldn't happen at top of map.
+ break
+ stripped = line.strip()
+ if stripped.startswith('- '):
+ break
+ # key: value or key:
+ if ':' not in stripped:
+ idx += 1
+ continue
+ # Split on the first ':' that isn't inside quotes.
+ key, _, rest = _split_key_value(stripped)
+ rest = rest.strip()
+ idx += 1
+ if rest == '' or rest is None:
+ # Block child.
+ if idx < len(lines) and indent_of(lines[idx]) > base_indent:
+ child, idx = parse_block(idx, indent_of(lines[idx]))
+ out[key] = child
+ else:
+ out[key] = None
+ else:
+ out[key] = scalar(rest)
+ return out, idx
+
+ def _split_key_value(stripped):
+ in_q = None
+ for i, ch in enumerate(stripped):
+ if in_q:
+ if ch == in_q:
+ in_q = None
+ continue
+ if ch in ('"', "'"):
+ in_q = ch
+ continue
+ if ch == ':':
+ key = stripped[:i].strip()
+ rest = stripped[i + 1 :]
+ # Unquote key.
+ if (key.startswith('"') and key.endswith('"')) or (
+ key.startswith("'") and key.endswith("'")
+ ):
+ key = key[1:-1]
+ return key, ':', rest
+ return stripped, None, ''
+
+ def parse_seq(idx, base_indent):
+ out = []
+ while idx < len(lines):
+ line = lines[idx]
+ ind = indent_of(line)
+ if ind < base_indent:
+ break
+ if ind > base_indent:
+ break
+ stripped = line.strip()
+ if not stripped.startswith('- '):
+ break
+ after_dash = stripped[2:].rstrip()
+ # Item indent = base_indent + 2 (for "- ")
+ item_inner_indent = base_indent + 2
+ idx += 1
+ if after_dash == '':
+ # Block item, child lines.
+ if idx < len(lines) and indent_of(lines[idx]) > base_indent:
+ child, idx = parse_block(idx, indent_of(lines[idx]))
+ out.append(child)
+ else:
+ out.append(None)
+ continue
+ if ':' in after_dash and not (
+ after_dash.startswith('"') or after_dash.startswith("'")
+ ):
+ # Inline first key-value of a mapping item. Treat the "- "
+ # as introducing a mapping whose first key is on this line.
+ key, _, rest = _split_key_value(after_dash)
+ rest = rest.strip()
+ item = {}
+ if rest == '':
+ if idx < len(lines) and indent_of(lines[idx]) >
item_inner_indent:
+ child, idx = parse_block(idx, indent_of(lines[idx]))
+ item[key] = child
+ else:
+ item[key] = None
+ else:
+ item[key] = scalar(rest)
+ # Continue absorbing further keys at item_inner_indent.
+ while idx < len(lines):
+ nline = lines[idx]
+ nind = indent_of(nline)
+ if nind < item_inner_indent:
+ break
+ if nind > item_inner_indent:
+ break
+ nstripped = nline.strip()
+ if nstripped.startswith('- '):
+ break
+ if ':' not in nstripped:
+ idx += 1
+ continue
+ nkey, _, nrest = _split_key_value(nstripped)
+ nrest = nrest.strip()
+ idx += 1
+ if nrest == '':
+ if idx < len(lines) and indent_of(lines[idx]) >
item_inner_indent:
+ child, idx = parse_block(idx,
indent_of(lines[idx]))
+ item[nkey] = child
+ else:
+ item[nkey] = None
+ else:
+ item[nkey] = scalar(nrest)
+ out.append(item)
+ else:
+ out.append(scalar(after_dash))
+ return out, idx
+
+ val, _ = parse_block(0, 0)
+ return val
+
+
+# --- Config loading -------------------------------------------------
+
+ROOT = os.environ.get('TRACKER_STATS_CACHE', '/tmp/tracker-stats-cache')
+OUT_PATH = os.environ.get('TRACKER_STATS_OUT', '/tmp/airflow_s_monthly.html')
+HERE = os.path.dirname(os.path.abspath(__file__))
+DEFAULT_CONFIG_PATH = os.path.join(HERE, 'default-config.yaml')
+
+
+def deep_merge(base, overlay):
+ """Deep-merge overlay into base. Lists are REPLACED (not concatenated)."""
+ if overlay is None:
+ return base
+ if not isinstance(base, dict) or not isinstance(overlay, dict):
+ return overlay
+ out = dict(base)
+ for k, v in overlay.items():
+ if k in out and isinstance(out[k], dict) and isinstance(v, dict):
+ out[k] = deep_merge(out[k], v)
+ else:
+ out[k] = v
+ return out
+
+
+def load_config():
+ with open(DEFAULT_CONFIG_PATH) as f:
+ cfg = yaml_load(f.read()) or {}
+ overlay_path = os.environ.get('TRACKER_STATS_CONFIG')
+ if overlay_path and os.path.exists(overlay_path):
+ with open(overlay_path) as f:
+ overlay = yaml_load(f.read()) or {}
+ cfg = deep_merge(cfg, overlay)
+ # Env-var quick overrides.
+ if os.environ.get('TRACKER_STATS_BUCKETS'):
+ cfg['buckets'] = os.environ['TRACKER_STATS_BUCKETS']
+ if 'TRACKER_STATS_START' in os.environ:
+ v = os.environ['TRACKER_STATS_START']
+ cfg['start'] = v if v else None
+ if 'TRACKER_STATS_UPSTREAM_REPO' in os.environ:
+ v = os.environ['TRACKER_STATS_UPSTREAM_REPO']
+ cfg['upstream_repo'] = None if v in ('', 'none', 'null') else v
+ return cfg
+
+
+CONFIG = load_config()
+
+BUCKETS_MODE = CONFIG.get('buckets', 'monthly')
+if BUCKETS_MODE not in ('monthly', 'quarterly'):
+ raise SystemExit(f"buckets must be 'monthly' or 'quarterly', got
{BUCKETS_MODE!r}")
+
+START_OVERRIDE = CONFIG.get('start')
+UPSTREAM_REPO = CONFIG.get('upstream_repo')
+SCOPE_LABELS = set(CONFIG.get('scope_labels') or [])
+MILESTONES = CONFIG.get('milestones') or []
+CATEGORIES_CFG = CONFIG.get('categories') or []
+TRIAGE_KW = CONFIG.get('triage', {}).get('keywords') or []
+BOT_PREFIXES = tuple(CONFIG.get('triage', {}).get('bot_prefixes') or [])
+
+# Distinct category names in the order they FIRST appear in CATEGORIES_CFG
+# (multiple rules can share a name to express disjoint branches of the
+# same final category).
+_seen = set()
+CATS_DEFAULT_ORDER = []
+for c in CATEGORIES_CFG:
+ if c['name'] not in _seen:
+ _seen.add(c['name'])
+ CATS_DEFAULT_ORDER.append(c['name'])
+STACK_ORDER = CONFIG.get('stack_order') or CATS_DEFAULT_ORDER
+# CATS used for snapshot counting is the distinct-name set. Plotting uses
+# STACK_ORDER (which may re-order them for visual layering).
+CATS = list(CATS_DEFAULT_ORDER)
+CAT_COLORS = {}
+for c in CATEGORIES_CFG:
+ CAT_COLORS.setdefault(c['name'], c.get('color', '#888888'))
+
+
+# --- Cache load -----------------------------------------------------
+
+with open(f'{ROOT}/issues.json') as f:
+ issues = json.load(f)
+with open(f'{ROOT}/roster.txt') as f:
+ roster = {ln.strip() for ln in f if ln.strip()}
+with open(f'{ROOT}/issue_extra.json') as f:
+ issue_extra = json.load(f)
+
+prs_cache = {}
+if UPSTREAM_REPO:
+ prs_path = f'{ROOT}/prs.json'
+ if os.path.exists(prs_path):
+ with open(prs_path) as f:
+ prs_cache = json.load(f)
+
+NOW = dt.datetime(2026, 5, 21, 0, 0, 0, tzinfo=dt.timezone.utc)
+
+if UPSTREAM_REPO:
+ # Match the original literal in the body-parse regex so an upstream
+ # of `apache/airflow` still matches the historical pre-existing
+ # `apache/airflow#NNN` references byte-for-byte.
+ repo_re = re.escape(UPSTREAM_REPO)
+ PR_PAT = re.compile(
+ rf'{repo_re}#(\d+)|https://github\.com/{repo_re}/pull/(\d+)', re.I
+ )
+else:
+ PR_PAT = None
+
+
+# --- helpers --------------------------------------------------------
+
+def parse_dt(s):
+ if not s:
+ return None
+ return dt.datetime.fromisoformat(s.replace('Z', '+00:00'))
+
+
+# --- Bucket abstraction --------------------------------------------
+
+def month_of(d):
+ return d.year, d.month
+
+
+def quarter_of(d):
+ return d.year, (d.month - 1) // 3 + 1
+
+
+def month_label(y, m):
+ return f"{y}-{m:02d}"
+
+
+def quarter_label(y, q):
+ return f"{y}-Q{q}"
+
+
+def month_end(y, m):
+ last_day = calendar.monthrange(y, m)[1]
+ return dt.datetime(y, m, last_day, 23, 59, 59, tzinfo=dt.timezone.utc)
+
+
+def quarter_end(y, q):
+ # q in {1,2,3,4}
+ last_month = q * 3
+ last_day = calendar.monthrange(y, last_month)[1]
+ return dt.datetime(y, last_month, last_day, 23, 59, 59,
tzinfo=dt.timezone.utc)
+
+
+def iter_months(y0, m0, y1, m1):
+ y, m = y0, m0
+ while (y, m) <= (y1, m1):
+ yield y, m
+ m += 1
+ if m == 13:
+ m = 1
+ y += 1
+
+
+def iter_quarters(y0, q0, y1, q1):
+ y, q = y0, q0
+ while (y, q) <= (y1, q1):
+ yield y, q
+ q += 1
+ if q == 5:
+ q = 1
+ y += 1
+
+
+if BUCKETS_MODE == 'monthly':
+ bucket_of = month_of
+ bucket_label = month_label
+ bucket_end = month_end
+ bucket_iter = iter_months
+else:
+ bucket_of = quarter_of
+ bucket_label = quarter_label
+ bucket_end = quarter_end
+ bucket_iter = iter_quarters
+
+
+# --- index issues + buckets ----------------------------------------
+
+issues_by_n = {i['number']: i for i in issues}
+earliest = min(parse_dt(i['createdAt']) for i in issues)
+
+if START_OVERRIDE:
+ if BUCKETS_MODE == 'monthly':
+ y0, m0 = (int(x) for x in START_OVERRIDE.split('-'))
+ start_key = (y0, m0)
+ else:
+ y_part, q_part = START_OVERRIDE.split('-Q')
+ start_key = (int(y_part), int(q_part))
+else:
+ start_key = bucket_of(earliest)
+
+end_key = bucket_of(NOW)
+buckets = list(bucket_iter(start_key[0], start_key[1], end_key[0], end_key[1]))
+bucket_labels = [bucket_label(*b) for b in buckets]
+n_buckets = len(buckets)
+
+print(f"earliest createdAt: {earliest.isoformat()} -> starts at
{bucket_label(*start_key)}")
+print(f"now: {NOW.isoformat()} -> ends at {bucket_label(*end_key)}")
+print(f"buckets in range ({BUCKETS_MODE}): {n_buckets}")
+
+# Per-issue events
+events_by_n = {}
+for n in issues_by_n:
+ p = f'{ROOT}/events/{n}.json'
+ if os.path.exists(p) and os.path.getsize(p) > 0:
+ with open(p) as f:
+ events_by_n[n] = json.load(f)
+ else:
+ events_by_n[n] = []
+
+
+# --- tracker -> linked PR list (from body parse + closedBy) --------
+
+def extract_prs_for_issue(n):
+ if not UPSTREAM_REPO:
+ return set()
+ v = issue_extra.get(str(n)) or {}
+ nums = set()
+ for ref in (v.get('closedByPullRequestsReferences') or []):
+ if ref.get('repository', {}).get('nameWithOwner') == UPSTREAM_REPO:
+ nums.add(ref['number'])
+ body = v.get('body') or ''
+ if PR_PAT is not None:
+ for m in PR_PAT.findall(body):
+ x = m[0] or m[1]
+ if x:
+ nums.add(int(x))
+ return nums
+
+
+issue_prs = {n: extract_prs_for_issue(n) for n in issues_by_n}
+
+
+def pr_meta(num):
+ """Return dict(createdAt=dt, mergedAt=dt|None, state=str) or None."""
+ v = prs_cache.get(str(num))
+ if not v or 'error' in v:
+ return None
+ return {
+ 'createdAt': parse_dt(v.get('createdAt')),
+ 'mergedAt': parse_dt(v.get('mergedAt')),
+ 'state': v.get('state'),
+ }
+
+
+def tracker_pr_signals(n):
+ earliest_created = None
+ earliest_created_pr = None
+ earliest_merged_ts = None
+ earliest_merged_pr = None
+ for prn in issue_prs.get(n, []):
+ meta = pr_meta(prn)
+ if meta is None:
+ continue
+ c = meta['createdAt']
+ if c is not None:
+ if earliest_created is None or c < earliest_created:
+ earliest_created = c
+ earliest_created_pr = prn
+ mt = meta['mergedAt']
+ if mt is not None:
+ if earliest_merged_ts is None or mt < earliest_merged_ts:
+ earliest_merged_ts = mt
+ earliest_merged_pr = prn
+ return {
+ 'first_pr_created': earliest_created,
+ 'first_pr_created_num': earliest_created_pr,
+ 'first_pr_merged': earliest_merged_ts,
+ 'first_pr_merged_num': earliest_merged_pr,
+ }
+
+
+tracker_signals = {n: tracker_pr_signals(n) for n in issues_by_n}
+
+
+# --- label timeline replay ------------------------------------------
+
+def labels_open_at(issue, ts):
+ n = issue['number']
+ created = parse_dt(issue['createdAt'])
+ if ts < created:
+ return None, None
+ labels = set()
+ is_open = True
+ for e in events_by_n.get(n, []):
+ et = parse_dt(e['created_at'])
+ if et > ts:
+ break
+ if e['event'] == 'labeled' and e.get('label'):
+ labels.add(e['label'])
+ elif e['event'] == 'unlabeled' and e.get('label'):
+ labels.discard(e['label'])
+ elif e['event'] == 'closed':
+ is_open = False
+ elif e['event'] == 'reopened':
+ is_open = True
+ return labels, is_open
+
+
+# --- Predicate evaluator -------------------------------------------
+
+def eval_predicate(pred, ctx):
+ """Evaluate a category predicate against a snapshot context.
+
+ `ctx` keys:
+ labels (set), is_open (bool), state_reason (str|None),
+ pr_merged_by_snapshot (bool).
+ """
+ if not isinstance(pred, dict):
+ return False
+ for key, val in pred.items():
+ if key == 'any_of':
+ if not any(eval_predicate(p, ctx) for p in val):
+ return False
+ elif key == 'all_of':
+ if isinstance(val, list):
+ if not all(eval_predicate(p, ctx) for p in val):
+ return False
+ elif isinstance(val, dict):
+ if not eval_predicate(val, ctx):
+ return False
+ else:
+ return False
+ elif key == 'state':
+ want_open = (val == 'open')
+ if ctx['is_open'] != want_open:
+ return False
+ elif key == 'state_reason':
+ if ctx['state_reason'] != val:
+ return False
+ elif key == 'any_label':
+ if not any(l in ctx['labels'] for l in val):
+ return False
+ elif key == 'all_labels':
+ if not all(l in ctx['labels'] for l in val):
+ return False
+ elif key == 'not_label':
+ if val in ctx['labels']:
+ return False
+ elif key == 'not_any_label':
+ if any(l in ctx['labels'] for l in val):
+ return False
+ elif key == 'no_scope_label':
+ has_scope = bool(ctx['labels'] & SCOPE_LABELS)
+ if val and has_scope:
+ return False
+ if not val and not has_scope:
+ return False
+ elif key == 'has_scope_label':
+ has_scope = bool(ctx['labels'] & SCOPE_LABELS)
+ if val and not has_scope:
+ return False
+ if not val and has_scope:
+ return False
+ elif key == 'pr_merged_by_snapshot':
+ if val and not ctx['pr_merged_by_snapshot']:
+ return False
+ if not val and ctx['pr_merged_by_snapshot']:
+ return False
+ else:
+ # Unknown key — fail safe.
+ return False
+ return True
+
+
+def classify_per_config(labels, is_open, ts, n):
+ issue = issues_by_n[n]
+ state_reason = issue.get('stateReason')
+ sig = tracker_signals.get(n, {})
+ fm = sig.get('first_pr_merged')
+ pr_merged_by_snapshot = bool(UPSTREAM_REPO and fm is not None and fm <= ts)
+ ctx = {
+ 'labels': labels,
+ 'is_open': is_open,
+ 'state_reason': state_reason,
+ 'pr_merged_by_snapshot': pr_merged_by_snapshot,
+ }
+ for cat in CATEGORIES_CFG:
+ if eval_predicate(cat['predicate'], ctx):
+ return cat['name']
+ return None
+
+
+# --- snapshot counts ------------------------------------------------
+
+counts = {cat: [0] * n_buckets for cat in CATS}
+backfill_trackers = set()
+
+for bi, b in enumerate(buckets):
+ be = bucket_end(*b)
+ ts = NOW if be > NOW else be
+ for i in issues:
+ labels, is_open = labels_open_at(i, ts)
+ if labels is None:
+ continue
+ cat = classify_per_config(labels, is_open, ts, i['number'])
+ if cat is None:
+ continue
+ counts[cat][bi] += 1
+
+ if cat == 'open_pr_merged' and is_open and 'pr merged' not in labels:
+ backfill_trackers.add(i['number'])
+
+# --- cumulative opened / closed ------------------------------------
+
+cum_opened = [0] * n_buckets
+cum_closed = [0] * n_buckets
+for bi, b in enumerate(buckets):
+ be = bucket_end(*b)
+ ts = NOW if be > NOW else be
+ op = 0
+ cl = 0
+ for i in issues:
+ ca = parse_dt(i['createdAt'])
+ if ca and ca <= ts:
+ op += 1
+ cz = parse_dt(i.get('closedAt'))
+ if cz and cz <= ts:
+ cl += 1
+ cum_opened[bi] = op
+ cum_closed[bi] = cl
+
+# --- Opened-in-bucket vs untriaged-at-bucket-end ------------------
+
+opened_in_b = [0] * n_buckets
+untriaged_at_bend = counts.get('open_untriaged', [0] * n_buckets)
+
+for i in issues:
+ ca = parse_dt(i['createdAt'])
+ if ca is None:
+ continue
+ cb = bucket_of(ca)
+ if cb < buckets[0] or cb > buckets[-1]:
+ continue
+ bi = buckets.index(cb)
+ opened_in_b[bi] += 1
+
+# --- triage / response ---------------------------------------------
+
+# Build the triage regex from config. Keep word-boundary wrapping for
+# the all-caps keywords so they don't match substrings inside other
+# words (mirrors the original handwritten regex).
+_kw_parts = []
+for kw in TRIAGE_KW:
+ if kw.isupper() and ' ' not in kw and '-' not in kw:
+ _kw_parts.append(rf'\b{re.escape(kw)}\b')
+ elif kw.isalpha() and kw.islower() and ' ' not in kw:
+ _kw_parts.append(rf'\b{re.escape(kw)}\b')
+ else:
+ _kw_parts.append(re.escape(kw))
+TRIAGE_RE = re.compile('|'.join(_kw_parts), re.IGNORECASE) if _kw_parts else
None
+
+
+def is_bot_body(body):
+ if not body:
+ return False
+ b = body.lstrip()
+ for p in BOT_PREFIXES:
+ if b.startswith(p):
+ return True
+ return False
+
+
+triage_hours_by_b = defaultdict(list)
+resp_hours_by_b = defaultdict(list)
+n_fallback_triage = 0
+n_no_triage = 0
+all_triage_hours = []
+
+for i in issues:
+ created = parse_dt(i['createdAt'])
+ blbl = bucket_label(*bucket_of(created))
+ comments = i.get('comments', []) or []
+
+ first_roster = None
+ first_roster_keyword = None
+ for c in comments:
+ author = (c.get('author') or {}).get('login')
+ if not author or author not in roster:
+ continue
+ if is_bot_body(c.get('body') or ''):
+ continue
+ ct = parse_dt(c['createdAt'])
+ if first_roster is None:
+ first_roster = ct
+ if (
+ first_roster_keyword is None
+ and TRIAGE_RE is not None
+ and TRIAGE_RE.search(c.get('body') or '')
+ ):
+ first_roster_keyword = ct
+ if first_roster is not None and first_roster_keyword is not None:
+ break
+
+ if first_roster is not None:
+ hours = (first_roster - created).total_seconds() / 3600
+ resp_hours_by_b[blbl].append(hours)
+
+ triage_ts = first_roster_keyword if first_roster_keyword is not None else
first_roster
+ if triage_ts is None:
+ n_no_triage += 1
+ continue
+ if first_roster_keyword is None:
+ n_fallback_triage += 1
+ hours = (triage_ts - created).total_seconds() / 3600
+ triage_hours_by_b[blbl].append(hours)
+ all_triage_hours.append(hours)
+
+
+def mean_or_none(xs):
+ return round(statistics.mean(xs), 2) if xs else None
+
+
+def per_b_series(by_b):
+ ys = []
+ ns = []
+ for b in buckets:
+ lbl = bucket_label(*b)
+ xs = by_b.get(lbl, [])
+ ys.append(mean_or_none(xs))
+ ns.append(len(xs))
+ return ys, ns
+
+
+triage_ys, triage_ns = per_b_series(triage_hours_by_b)
+resp_ys, resp_ns = per_b_series(resp_hours_by_b)
+
+triage_median = round(statistics.median(all_triage_hours), 2) if
all_triage_hours else None
+triage_mean = round(statistics.mean(all_triage_hours), 2) if all_triage_hours
else None
+triage_n = len(all_triage_hours)
+
+
+# --- PR-driven mean-time metrics -----------------------------------
+
+prc_by_b = defaultdict(list)
+prm_by_b = defaultdict(list)
+rel_by_b = defaultdict(list)
+
+
+def first_label_time(n, label):
+ for e in events_by_n.get(n, []):
+ if e['event'] == 'labeled' and e.get('label') == label:
+ return parse_dt(e['created_at'])
+ return None
+
+
+if UPSTREAM_REPO:
+ for i in issues:
+ n = i['number']
+ created = parse_dt(i['createdAt'])
+ sig = tracker_signals.get(n, {})
+
+ first_pr_c = sig.get('first_pr_created')
+ first_pr_m = sig.get('first_pr_merged')
+
+ if first_pr_c and created and first_pr_c >= created:
+ days = (first_pr_c - created).total_seconds() / 86400
+ prc_by_b[bucket_label(*bucket_of(created))].append(days)
+
+ if first_pr_m is not None:
+ prn = sig.get('first_pr_merged_num')
+ meta = pr_meta(prn) if prn else None
+ if meta and meta['createdAt'] and meta['mergedAt'] and
meta['mergedAt'] >= meta['createdAt']:
+ days = (meta['mergedAt'] - meta['createdAt']).total_seconds()
/ 86400
+
prm_by_b[bucket_label(*bucket_of(meta['createdAt']))].append(days)
+
+ if first_pr_m is not None:
+ announced = (first_label_time(n, 'announced - emails sent')
+ or first_label_time(n, 'announced'))
+ rel_ts = announced
+ if rel_ts is None:
+ ca = parse_dt(i.get('closedAt'))
+ state_reason = i.get('stateReason')
+ cur_labels = {l['name'] for l in i.get('labels', [])}
+ is_closed_completed = (i.get('state') == 'CLOSED' and
state_reason == 'COMPLETED')
+ has_cve = 'cve allocated' in cur_labels
+ if ca and is_closed_completed and has_cve:
+ rel_ts = ca
+ if rel_ts is not None and rel_ts > first_pr_m:
+ days = (rel_ts - first_pr_m).total_seconds() / 86400
+ rel_by_b[bucket_label(*bucket_of(first_pr_m))].append(days)
+
+
+prc_ys, prc_ns = per_b_series(prc_by_b)
+prm_ys, prm_ns = per_b_series(prm_by_b)
+rel_ys, rel_ns = per_b_series(rel_by_b)
+
+
+def n_buckets_with_data(by_b):
+ return sum(1 for k, xs in by_b.items() if xs)
+
+
+def overall_median(by_b):
+ flat = [x for xs in by_b.values() for x in xs]
+ return round(statistics.median(flat), 2) if flat else None
+
+
+def overall_n(by_b):
+ return sum(len(xs) for xs in by_b.values())
+
+
+# --- KPIs ----------------------------------------------------------
+
+total = len(issues)
+open_now = sum(1 for i in issues if i.get('state') == 'OPEN')
+closed_now = total - open_now
+
+def latest(cat):
+ return counts[cat][-1] if cat in counts else 0
+
+
+print(f"total trackers: {total}")
+print(f"open: {open_now}, closed: {closed_now}")
+print(f"fixed_released (latest bucket): {latest('fixed_released')}")
+print(f"open_untriaged: {latest('open_untriaged')}, open_triaged:
{latest('open_triaged')}, "
+ f"open_pr_merged: {latest('open_pr_merged')}, closed_other:
{latest('closed_other')}")
+print(f"triage median {triage_median}h, mean {triage_mean}h, n={triage_n} "
+ f"(fallback={n_fallback_triage}, none={n_no_triage})")
+
+if UPSTREAM_REPO:
+ print()
+ print("PR-driven mean-time series:")
+ for name, by_b in [
+ ('c_prc', prc_by_b),
+ ('c_prm', prm_by_b),
+ ('c_rel', rel_by_b),
+ ]:
+ print(f" {name}: n={overall_n(by_b)} median={overall_median(by_b)} "
+ f"buckets_with_data={n_buckets_with_data(by_b)}")
+
+print()
+print(f"open_pr_merged back-fill: {len(backfill_trackers)} trackers were
re-classified "
+ f"from open_triaged -> open_pr_merged in at least one historical bucket "
+ f"because of the PR-merge-date rule")
+print()
+print(f"Latest bucket ({bucket_labels[-1]}) opened-vs-untriaged: "
+ f"opened_in_b={opened_in_b[-1]},
untriaged_at_bend={untriaged_at_bend[-1]}")
+
+
+# --- Render HTML ---------------------------------------------------
+
+def js_array(xs, fmt_null='null'):
+ parts = []
+ for x in xs:
+ if x is None:
+ parts.append(fmt_null)
+ elif isinstance(x, float):
+ parts.append(f"{x:.2f}" if not (x == int(x)) else f"{int(x)}")
+ else:
+ parts.append(str(x))
+ return '[' + ', '.join(parts) + ']'
+
+
+def js_quotes(xs):
+ return '[' + ', '.join(f'"{x}"' for x in xs) + ']'
+
+
+def milestone_x(milestone_date):
+ """Map a milestone date (YYYY-MM-DD) onto a bucket-axis label."""
+ y = int(milestone_date[:4])
+ mo = int(milestone_date[5:7])
+ if BUCKETS_MODE == 'monthly':
+ return f"{y}-{mo:02d}"
+ return f"{y}-Q{(mo - 1) // 3 + 1}"
+
+
+# Title prefix differs between bucket modes for clarity.
+bucket_word = 'month' if BUCKETS_MODE == 'monthly' else 'quarter'
+
+# Build stacked-band traces in STACK_ORDER. With the default config that
+# resolves to `fixed_released, open_pr_merged, open_triaged,
+# open_untriaged, closed_other` — matching the reference dashboard.
+stacked_traces = []
+for cat in STACK_ORDER:
+ if cat not in counts:
+ continue
+ color = CAT_COLORS.get(cat, '#888888')
+ ys = js_array(counts[cat])
+ stacked_traces.append(
+ f" {{x: buckets, y: {ys}, name: '{cat}', stackgroup: 'one', "
+ f"type: 'scatter', mode: 'lines', line: {{color: '{color}', width:
0}}, "
+ f"fillcolor: '{color}', hoveron: 'points+fills'}}"
+ )
+stacked_block = ',\n'.join(stacked_traces)
+
+# Milestone shapes + annotations (multi-milestone capable).
+ms_shapes = []
+ms_annots = []
+for ms in MILESTONES:
+ ms_date = ms.get('date')
+ ms_label = ms.get('label') or 'milestone'
+ if not ms_date:
+ continue
+ x_val = milestone_x(str(ms_date))
+ ms_shapes.append(
+ "{type: 'line', xref: 'x', yref: 'paper', x0: '" + x_val
+ + "', x1: '" + x_val
+ + "', y0: 0, y1: 1, line: {color: '#888', width: 1.5, dash: 'dash'}}"
+ )
+ ms_annots.append(
+ "{xref: 'x', yref: 'paper', x: '" + x_val
+ + "', y: 1.04, xanchor: 'left', text: '↓ " + ms_label + " (" +
str(ms_date) + ")', "
+ + "showarrow: false, font: {size: 11, color: '#666'}}"
+ )
+shapes_js = '[' + ', '.join(ms_shapes) + ']'
+annots_js = '[' + ', '.join(ms_annots) + ']'
+
+
+# Build the optional PR-charts HTML and JS sections.
+if UPSTREAM_REPO:
+ pr_cards_html = (
+ '<div class="card"><div id="c_prc"></div></div>\n'
+ '<div class="card"><div id="c_prm"></div></div>\n'
+ '<div class="card"><div id="c_rel"></div></div>\n'
+ )
+ pr_charts_js = (
+ f"meanChart('c_prc', 'Mean time createdAt → PR opened (days)', "
+ f"{js_array(prc_ys)}, {js_array(prc_ns)}, 'd', '#16a085');\n"
+ f"meanChart('c_prm', 'Mean time PR-open → PR-merged (days)', "
+ f"{js_array(prm_ys)}, {js_array(prm_ns)}, 'd', '#2980b9');\n"
+ f"meanChart('c_rel', 'Mean time PR-merged → advisory announced
(days)', "
+ f"{js_array(rel_ys)}, {js_array(rel_ns)}, 'd', '#d35400');"
+ )
+else:
+ pr_cards_html = ''
+ pr_charts_js = ''
+
+
+HTML = f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<title>tracker {bucket_word}ly statistics</title>
+<script src="https://cdn.plot.ly/plotly-2.35.2.min.js"></script>
+<style>
+body {{ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI",
sans-serif; margin: 0 auto; padding: 16px; color: #222; max-width: 1400px; }}
+.grid {{ display: grid; grid-template-columns: 1fr 1fr; gap: 16px; }}
+.card {{ border: 1px solid #e0e0e0; border-radius: 8px; padding: 8px;
background: #fafafa; }}
+.card.full {{ grid-column: 1 / -1; }}
+</style>
+</head>
+<body>
+
+<div class="grid">
+
+<div class="card full"><div id="c_states"></div></div>
+<div class="card full"><div id="c_open_vs_untriaged"></div></div>
+<div class="card full"><div id="c_cum"></div></div>
+<div class="card"><div id="c_triage"></div></div>
+<div class="card"><div id="c_resp"></div></div>
+{pr_cards_html}
+</div>
+
+<script>
+const buckets = {js_quotes(bucket_labels)};
+
+function lineOpts() {{ return {{ type: 'scatter', mode: 'lines+markers',
connectgaps: true }}; }}
+
+// Milestone markers (config-driven).
+const milestoneShapes = {shapes_js};
+const milestoneAnnotations = {annots_js};
+const MILESTONES_LAYOUT = {{shapes: milestoneShapes, annotations:
milestoneAnnotations}};
+
+// Stacked-line lifecycle bands
+Plotly.newPlot('c_states', [
+{stacked_block}
+], {{
+ ...MILESTONES_LAYOUT,
+ title: 'Issue lifecycle bands (stacked, end-of-{bucket_word} snapshots)',
+ yaxis: {{title: 'tracker count'}},
+ legend: {{orientation: 'h'}},
+ hovermode: 'x unified'
+}});
+
+// Opened-in-bucket vs untriaged-at-bucket-end
+Plotly.newPlot('c_open_vs_untriaged', [
+ {{x: buckets, y: {js_array(opened_in_b)}, name: 'opened in
{bucket_word}',
+ type: 'scatter', mode: 'lines+markers', connectgaps: true,
+ line: {{color: '#1f77b4'}}}},
+ {{x: buckets, y: {js_array(untriaged_at_bend)}, name: 'untriaged at
{bucket_word}-end',
+ type: 'scatter', mode: 'lines+markers', connectgaps: true,
+ line: {{color: '#d62728'}}}}
+], {{
+ ...MILESTONES_LAYOUT,
+ title: 'Opened vs. untriaged backlog (per {bucket_word})',
+ yaxis: {{title: 'count'}},
+ legend: {{orientation: 'h'}}
+}});
+
+Plotly.newPlot('c_cum', [
+ {{x: buckets, y: {js_array(cum_opened)}, name: 'cumulative opened',
+ type: 'scatter', mode: 'lines+markers', connectgaps: true,
+ line: {{color: '#1f77b4'}}, fill: 'tozeroy'}},
+ {{x: buckets, y: {js_array(cum_closed)}, name: 'cumulative closed',
+ type: 'scatter', mode: 'lines+markers', connectgaps: true,
+ line: {{color: '#2ca02c'}}, fill: 'tozeroy'}}
+], {{
+ ...MILESTONES_LAYOUT,
+ title: 'Cumulative opened vs. closed (gap = open backlog)',
+ yaxis: {{title: 'count'}},
+ legend: {{orientation: 'h'}}
+}});
+
+function meanChart(divId, title, ys, ns, unit, color) {{
+ Plotly.newPlot(divId, [{{
+ x: buckets, y: ys,
+ type: 'scatter', mode: 'lines+markers', connectgaps: true,
+ text: ns.map(n => 'n=' + n),
+ hovertemplate: '%{{x}}<br>mean: %{{y:.2f}} ' + unit +
'<br>%{{text}}<extra></extra>',
+ line: {{color: color}}
+ }}], {{
+ ...MILESTONES_LAYOUT,
+ title: title,
+ yaxis: {{title: 'mean ' + unit, rangemode: 'tozero'}}
+ }});
+}}
+
+meanChart('c_triage', 'Mean time to triage (hours)',
{js_array(triage_ys)}, {js_array(triage_ns)}, 'h', '#c0392b');
+meanChart('c_resp', 'Mean time to first response (hours)',
{js_array(resp_ys)}, {js_array(resp_ns)}, 'h', '#8e44ad');
+{pr_charts_js}
+</script>
+</body>
+</html>
+"""
+
+with open(OUT_PATH, 'w') as f:
+ f.write(HTML)
+
+print(f"\nWrote {OUT_PATH} ({len(HTML)} bytes)")
diff --git a/tools/security-tracker-stats-dashboard/run.sh
b/tools/security-tracker-stats-dashboard/run.sh
new file mode 100755
index 0000000..9e7e01a
--- /dev/null
+++ b/tools/security-tracker-stats-dashboard/run.sh
@@ -0,0 +1,58 @@
+#!/bin/bash
+# Orchestrator - fetch all data then render the dashboard.
+#
+# Usage: run.sh [output-path]
+#
+# Env overrides:
+# TRACKER_STATS_CACHE (default: /tmp/tracker-stats-cache)
+# TRACKER_STATS_OUT (default: /tmp/airflow_s_monthly.html - or
arg $1)
+# TRACKER_STATS_REPO tracker repo (default: airflow-s/airflow-s)
+# TRACKER_STATS_BUCKETS monthly | quarterly (overlay)
+# TRACKER_STATS_START "YYYY-MM" / "YYYY-Qn" (overlay)
+# TRACKER_STATS_UPSTREAM_REPO upstream repo slug or "none" (overlay)
+# TRACKER_STATS_CONFIG path to a YAML overlay file
+#
+# render.py reads its config from `scripts/default-config.yaml`,
+# optionally overlaid by $TRACKER_STATS_CONFIG and the env-var quick
+# overrides above. See default-config.yaml for the schema.
+
+set -e
+HERE="$(cd "$(dirname "$0")" && pwd)"
+
+if [ -n "$1" ]; then
+ export TRACKER_STATS_OUT="$1"
+fi
+
+# Prefer python with PyYAML if available; render.py falls back to a tiny
+# built-in YAML subset parser when pyyaml is missing. Adopters who use
+# `uv` can opt in to a clean PyYAML invocation by setting
+# TRACKER_STATS_PY=uv-yaml; default is plain python3.
+PY="${TRACKER_STATS_PY:-python3}"
+case "$PY" in
+ uv-yaml)
+ PY_CMD=(uv run --with pyyaml python3)
+ ;;
+ *)
+ PY_CMD=("$PY")
+ ;;
+esac
+
+echo "-> fetch_issues"
+"${PY_CMD[@]}" "$HERE/fetch_issues.py"
+
+echo "-> fetch_roster"
+"${PY_CMD[@]}" "$HERE/fetch_roster.py"
+
+echo "-> fetch_bodies"
+"${PY_CMD[@]}" "$HERE/fetch_bodies.py"
+
+echo "-> fetch_events"
+"${PY_CMD[@]}" "$HERE/fetch_events.py"
+
+echo "-> fetch_prs"
+"${PY_CMD[@]}" "$HERE/fetch_prs.py"
+
+echo "-> render"
+"${PY_CMD[@]}" "$HERE/render.py"
+
+echo "done: ${TRACKER_STATS_OUT:-/tmp/airflow_s_monthly.html}"