> I wonder if we should separate deterministic assessment from narrative synthesis.
What works very well - I think—is indeed **mixing** those (and SKILLs are actually very versatile, paired with MCPs that can provide good APIs hiding some of the internal complexities of the systems to interact with. The nice thing about having some "higher abstraction" deterministic CLIs and scripts that take in some parameters is that you can use natural language to interact with them much more easily than using natural language to change the code, and iterate over it. For example to change the report type, aggregation method, or selection criteria, it's great to turn all possible criteria into a deterministic script that retrieves/aggregates the right data. Then, let the LLM interpret your intention when calling this particular script with a specific parameter set. This allows you, for example, to just modify "what you want" from the report by describing it in English, while the "how you retrieve it" remains implemented in a traditional, deterministic way - which is faster, cheaper and - well - deterministic. then indeed analysis of the output should be done on the returned data based on your expectations. I think this is the biggest "learning" for us when developing such an agentic solution: learning when to use what: * Hooks to add deterministic guards on certain "MUST HAVEs" or "DON'T DOs" * SKILL English for conversational flexibility * Deterministic code to perform fast and cheap operations that provide a good level of abstraction for the conversation * English again to assess what comes out of it * MCP to write multiple SKILLS using the reusable abstraction for various users with different needs That's my current experience - so far at least - that we should use the right tool for the right job; nothing really new under the sun :). Side comment: I have a feeling that the next level shift in our human-human discussions—from the "Architecture of code" and "Does this code look good aesthetically"—should shift to "Which tools to use for what." I think I've had a number of discussions of this type recently and we should build our experience by trying (and failing/succeeding) with various approaches for that :) J. On Sat, May 30, 2026 at 3:45 PM Vladimir Sitnikov < [email protected]> wrote: > Hi, > > The current prompt is a good start, but I think some of the assessment > rules need to be made more explicit. For example, it asks the model to > compare 3m/6m/12m windows and describe whether activity is growing, stable, > or declining, but it does not define what those terms mean. Is a 10% change > meaningful? 25%? Should a drop in absolute numbers matter if the number of > contributors is still healthy? Similarly, it asks the model to surface > health concerns, but does not define the judgment rules for when mentor > sign-off rate, reviewer diversity, bus factor, mailing-list activity, or > release cadence become concerns. > > I wonder if we should separate deterministic assessment from narrative > synthesis. Trends such as PR count, commit activity, reviewer diversity, > mentor sign-off rate, release cadence, and mailing-list participation can > probably be computed directly by the tools or a small rule layer and > included in the draft as factual signals. The LLM could then be used where > it adds more value: summarising less structured mailing-list discussions, > identifying significant proposals or unresolved issues, spotting > communication concerns, explaining mixed signals, and turning the computed > signals into readable report language. > > That would make the draft easier to review: the factual signals and > thresholds would be explicit and auditable, while the LLM would focus on > synthesis rather than silently inventing the assessment policy. > > Vladimir >
