> I wonder if we should separate deterministic assessment from narrative
synthesis.

What works very well - I think—is indeed **mixing** those (and SKILLs are
actually very versatile, paired with MCPs that can provide good APIs hiding
some of the internal complexities of the systems to interact with.

The nice thing about having some "higher abstraction" deterministic CLIs
and scripts that take in some parameters is that you can use natural
language to interact with them much more easily than using natural language
to change the code, and iterate over it. For example to change the report
type, aggregation method, or selection criteria, it's great to turn all
possible criteria into a deterministic script that retrieves/aggregates
the right data. Then, let the LLM interpret your intention when calling
this particular script with a specific parameter set. This allows you, for
example, to just modify "what you want" from the report by describing it in
English, while the "how you retrieve it" remains implemented in a
traditional, deterministic way - which is faster, cheaper and - well -
deterministic.  then indeed analysis of the output should be done on the
returned data based on your expectations.

I think this is the biggest "learning" for us when developing such an
agentic solution: learning when to use what:

* Hooks to add deterministic guards on certain "MUST HAVEs" or "DON'T DOs"
* SKILL English for conversational flexibility
* Deterministic code to perform fast and cheap operations that provide a
good level of abstraction for the conversation
* English again to assess what comes out of it
* MCP to write multiple SKILLS using the reusable abstraction for various
users with different needs

That's my current experience - so far at least - that we should use the
right tool for the right job; nothing really new under the sun :).

Side comment: I have a feeling that the next level shift in our human-human
discussions—from the "Architecture of code" and "Does this code look good
aesthetically"—should shift to "Which tools to use for what." I think I've
had a number of discussions of this type recently and we should build our
experience by trying (and failing/succeeding) with various approaches for
that :)

J.


On Sat, May 30, 2026 at 3:45 PM Vladimir Sitnikov <
[email protected]> wrote:

> Hi,
>
> The current prompt is a good start, but I think some of the assessment
> rules need to be made more explicit. For example, it asks the model to
> compare 3m/6m/12m windows and describe whether activity is growing, stable,
> or declining, but it does not define what those terms mean. Is a 10% change
> meaningful? 25%? Should a drop in absolute numbers matter if the number of
> contributors is still healthy? Similarly, it asks the model to surface
> health concerns, but does not define the judgment rules for when mentor
> sign-off rate, reviewer diversity, bus factor, mailing-list activity, or
> release cadence become concerns.
>
> I wonder if we should separate deterministic assessment from narrative
> synthesis. Trends such as PR count, commit activity, reviewer diversity,
> mentor sign-off rate, release cadence, and mailing-list participation can
> probably be computed directly by the tools or a small rule layer and
> included in the draft as factual signals. The LLM could then be used where
> it adds more value: summarising less structured mailing-list discussions,
> identifying significant proposals or unresolved issues, spotting
> communication concerns, explaining mixed signals, and turning the computed
> signals into readable report language.
>
> That would make the draft easier to review: the factual signals and
> thresholds would be explicit and auditable, while the LLM would focus on
> synthesis rather than silently inventing the assessment policy.
>
> Vladimir
>

Reply via email to