Re: [PR] docs(pr-management-stats): caveman-style rewrite (pilot) [airflow-steward]

via GitHub Thu, 07 May 2026 07:40:28 -0700


andreahlert commented on PR #87:
URL: https://github.com/apache/airflow-steward/pull/87#issuecomment-4398095052

> Hey Everyone here: WDYT ? (@andreahlert @johnslavik @choo121600 ?
especially). Since the mission of the project is to be affordable, I think this
makes a lot of sense even if prose gets a little bit more difficult to read by
human.
>
> Let me know what you think -> if this will be acceptable, I plan to create
an issue with tasks and we could attempt to convert all of the SKILLS/prose
this way?
>
> Particularly when testing wiht @shahar1 - the tokens on smaller
subscription got very quickly eaten.

hey! Thanks for that. i took some time to fully test with Claude, and
measured this end to end before commenting. scripts and raw numbers:
[[gist](https://gist.github.com/andreahlert/f76cd049b7418aa39b34f73a4c5df8d7)](https://gist.github.com/andreahlert/f76cd049b7418aa39b34f73a4c5df8d7)

i'm against merging this as a pilot for the 80 file umbrella. two reasons.

the `-24%` in the body is a word count. on the actual claude tokenizer the
file drops `-9.4%` in isolation. in a realistic mid-session turn (system
prompt, tools, CLAUDE.md, prior conversation, skill body) the saving is
`-2.10%`. on long sessions (100k+ history) it sits between `-0.4%` and `-1%`.
the pitch and the runtime saving are off by 12x to 60x depending on session
length. that's not a margin worth committing review budget on across 80 files.

second, the body claims frontmatter stayed verbatim. it didn't. the diff
rewrites `description` and `when_to_use`, which is the trigger and recall
surface. that needs to either revert or land with a recall test, not slip in
under a "body only" framing.

i'm aligned with the affordability mission, that's not in dispute. but the
sequencing here, imho, is wrong. A/B/C/D modes aren't all shipped. we don't
have telemetry on which sections are actually hot. optimizing prose density on
cold paths before the platform is at full potential locks in style overhead for
marginal real-world saving.

two cheap moves that dominate this on ROI right now and i'd rather see land
first:

haiku 4.5 for skill routing. 20 probe A/B across haiku/sonnet/opus, haiku at
90% accuracy, 15x cheaper than opus, 5x faster. one failure mode (hallucinated
skill name), gateable with a name validator. that's a `-15x` cost cut on the
routing layer with one PR.

boilerplate dedup. `Adopter overrides`, `Snapshot drift`, `Prerequisites`
repeat across nearly every SKILL.md. extracting them saves around 5,900 tokens
one shot, no maintenance debt, no style lock in. one PR, three sections.

my position: is to hold this PR and the umbrella. ship A/B/C/D, get
telemetry, then apply caveman targeted at hot sections where it actually pays.
caveman blanket before that could let us in a wrong order of operations.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs(pr-management-stats): caveman-style rewrite (pilot) [airflow-steward]

Reply via email to