Re: request for help: LLM-based quality assurance

Bruno Haible via Gnulib discussion list Mon, 08 Jun 2026 07:50:25 -0700

Hi Tim,

> Another experiment with Claude code (closed source agent) connected to 
> the same local LLM as before found the issue in commit 
> 17dc60e624cd6fc3491f9cb002f760d60e66ce8b:
> 
> "Important caveat: The original commit 17dc60e624 introduced a bug in 
> mbrtowc.c — it replaced MBRTOWC_EMPTY_INPUT_BUG with 
> MBRTOC32_EMPTY_INPUT_BUG (wrong macro name) and added _GL_SMALL_WCHAR_T 
> (mbrtoc32-specific). This was fixed by follow-up commit 2ca51a77e6. Both 
> commits should be evaluated together."


Well, if it knows about the follow-up commit, we don't know whether it
found the regression independently of that.

> It produced a longish result/summary explaining the issue in details.

Too much details are not advisable. Because what is cheaper: Reviewing
a 20-lines patch or reading a 50-lines analysis report?

> This tells me, that we have to craft a proper system prompt and/or a 
> review skill to get comparable results from an oss agent + open weight 
> model.

That's quite likely, yes.

> Do we already know whether we tend to
>    - best results using closed source, closed weights, energy intensive, 
> costly
> or
>    - best effort using open source, open weights, 100x (1000x?) less 
> energy intensive, minimal costs
> ?

In general, we should strive for the locally running open-weights models.
If it's not locally running, it's SaaS, and what people report is that
the quality slowly gets worse over time, until a new version of the LLM
is released, at which moment the quality rises again - but with unpredictable
effects on the particular workflow.

Occasionally it happens that the closed-source alternative is more usable
than the open-source one. For instance, I did not get a good experience
with GitLab CI (open-source), because I could not find out how to store
large log files in the case of failed builds. Whereas in GitHub the handling
of the log files is not perfect either, but at least reasonable.

But here, in the LLM space, the situation is different: If you stick
to an SaaS model, you are forced to update the prompt or AGENTS.md file
every two months. Which may be unreasonably costly in the long run.

Bruno

Re: request for help: LLM-based quality assurance

Reply via email to