Re: request for help: LLM-based quality assurance

Tim Rühsen Mon, 08 Jun 2026 03:02:48 -0700

Hi Bruno, Pavel

Thanks for explaining your setup and showing their results.

I think for some kind of decision we need some kind of reproducibility (using the loosely here as results from LLMs rarely reproduce exactly).


> The fact that your setup did not flag this commit as a regression shows
> that Pavel's setup is more useful.

Absolutely. That's why I am very curious about the setup.

Another experiment with Claude code (closed source agent) connected to the same local LLM as before found the issue in commit 17dc60e624cd6fc3491f9cb002f760d60e66ce8b:

"Important caveat: The original commit 17dc60e624 introduced a bug in mbrtowc.c — it replaced MBRTOWC_EMPTY_INPUT_BUG with MBRTOC32_EMPTY_INPUT_BUG (wrong macro name) and added _GL_SMALL_WCHAR_T (mbrtoc32-specific). This was fixed by follow-up commit 2ca51a77e6. Both commits should be evaluated together."


It produced a longish result/summary explaining the issue in details.

What makes Claude code different from the pi agent (OSS) is it's very verbose system prompt, which alone is >20.000 tokens. I did not use a skill this time. Claude took 15mins vs. pi taking 3mins in the previous experiment.

This tells me, that we have to craft a proper system prompt and/or a review skill to get comparable results from an oss agent + open weight model.


Do we already know whether we tend to

- best results using closed source, closed weights, energy intensive, costly

or

- best effort using open source, open weights, 100x (1000x?) less energy intensive, minimal costs

Does it make sense to work on a gnulib specific AGENTS.md as well as on a system prompt that can be used for reviews?

15 lines of suggestions about a #if, without noticing that this #if condition
is in fact buggy (incomplete): It needs to be enabled on all platforms which
test pwc or n. Including the cases MBRTOWC_NUL_RETVAL_BUG and
MBRTOWC_IN_C_LOCALE_MAYBE_EILSEQ. I'm fixing that through the patch below.


Thank you, Bruno!

Regards, Tim

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: request for help: LLM-based quality assurance

Reply via email to