This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new d294802 docs(mode-economics): fix internal inconsistency and
amortisation basis (#326)
d294802 is described below
commit d2948021b6796135bff11cee5f74fa7ca55a69f4
Author: André Ahlert <[email protected]>
AuthorDate: Tue May 26 18:55:34 2026 -0300
docs(mode-economics): fix internal inconsistency and amortisation basis
(#326)
Three small corrections to docs/mode-economics.md, all confined to the
existing "Reducing costs" and "Local and self-hosted inference"
sections — no new claims, only tightening claims already on the page.
1. Cache section referenced "the skill file (3 000–6 000 tokens)" as
the ideal cache candidate. That figure is the pre-correction anchor
the PR description of #253 explicitly called out as wrong; the
"What 'tokens' means here" table on the same page now reports
measured ranges (small ~1k–3k, typical ~3.5k–9k median ~5.3k,
large security skills ~11k–36k). Replace the inline figure with a
pointer to the corrected anchor so the doc no longer contradicts
itself.
2. While that paragraph is open: add a one-sentence TTL caveat to the
cache recommendation. Anthropic's prompt cache TTL is 5 min default
(1 h extended at higher write cost), so the "first invocation pays;
subsequent invocations cheap" pattern is real for bursty same-session
workloads but typically misses for periodic triage / mentor replies
spaced through a day — exactly the workloads the same page lists
above. Flagging the constraint avoids the footgun.
3. Local-inference table listed "~$0.10–0.50/hr amortised" with no
amortisation basis. "Amortised" needs a denominator to be
interpretable. Inline the assumption (capex over ~3 yr lifespan ×
moderate utilisation) so a reader can sanity-check the range against
their own hardware and utilisation profile.
No table structure changes, no methodology changes, no new rows. Page
still does not carry a measurement-date / coverage / tokenizer banner —
those are separate, broader concerns better raised as an issue.
---
docs/mode-economics.md | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/docs/mode-economics.md b/docs/mode-economics.md
index b5ee098..3bc837c 100644
--- a/docs/mode-economics.md
+++ b/docs/mode-economics.md
@@ -201,7 +201,7 @@ per-token billing to hardware:
| Inference path | Per-token cost | Typical hardware cost | Notes |
|---|---|---|---|
-| Consumer GPU, Small-class quantised model | $0 | ~$0.10–0.50/hr amortised |
Viable for Triage and short Mentoring/Drafting |
+| Consumer GPU, Small-class quantised model | $0 | ~$0.10–0.50/hr (capex
amortised over ~3 yr lifespan × moderate utilisation) | Viable for Triage and
short Mentoring/Drafting |
| Cloud spot GPU, Mid-tier model | $0 | ~$1–4/hr depending on GPU class |
Viable for all modes; latency is higher than hosted APIs |
| CPU-only, quantised Small model | $0 | Near-zero | Very slow; not
recommended for interactive Pairing |
@@ -223,9 +223,14 @@ paths use identical skill code to hosted paths.
skill read only what is relevant.
3. **Cache skill context.** Most agent CLIs support prompt-level
- caching. The skill file (3 000–6 000 tokens) and stable project
- configuration files are ideal cache candidates — the first invocation
- pays; subsequent invocations are cheap on the cached portion.
+ caching. The skill file (size varies by skill class; see
+ [What "tokens" means here](#what-tokens-means-here)) and stable
+ project configuration files are ideal cache candidates — the first
+ invocation pays; subsequent invocations are cheap on the cached
+ portion. Note: most provider caches have a short TTL (Anthropic
+ prompt cache: 5 min default, 1 h extended at higher write cost),
+ so bursty same-session workloads benefit most; periodic triage runs
+ spaced hours apart will typically miss the cache.
4. **Batch triage.** `issue-reassess` and `pr-management-stats`
amortise context load across a pool. Running them weekly rather than