Hi Bruno, Pavel

Thanks for explaining your setup and showing their results.

I think for some kind of decision we need some kind of reproducibility (using the loosely here as results from LLMs rarely reproduce exactly).

> The fact that your setup did not flag this commit as a regression shows
> that Pavel's setup is more useful.

Absolutely. That's why I am very curious about the setup.

Another experiment with Claude code (closed source agent) connected to the same local LLM as before found the issue in commit 17dc60e624cd6fc3491f9cb002f760d60e66ce8b:

"Important caveat: The original commit 17dc60e624 introduced a bug in mbrtowc.c — it replaced MBRTOWC_EMPTY_INPUT_BUG with MBRTOC32_EMPTY_INPUT_BUG (wrong macro name) and added _GL_SMALL_WCHAR_T (mbrtoc32-specific). This was fixed by follow-up commit 2ca51a77e6. Both commits should be evaluated together."

It produced a longish result/summary explaining the issue in details.
What makes Claude code different from the pi agent (OSS) is it's very verbose system prompt, which alone is >20.000 tokens. I did not use a skill this time. Claude took 15mins vs. pi taking 3mins in the previous experiment.

This tells me, that we have to craft a proper system prompt and/or a review skill to get comparable results from an oss agent + open weight model.

Do we already know whether we tend to
- best results using closed source, closed weights, energy intensive, costly
or
- best effort using open source, open weights, 100x (1000x?) less energy intensive, minimal costs
?

Does it make sense to work on a gnulib specific AGENTS.md as well as on a system prompt that can be used for reviews?

15 lines of suggestions about a #if, without noticing that this #if condition
is in fact buggy (incomplete): It needs to be enabled on all platforms which
test pwc or n. Including the cases MBRTOWC_NUL_RETVAL_BUG and
MBRTOWC_IN_C_LOCALE_MAYBE_EILSEQ. I'm fixing that through the patch below.

Thank you, Bruno!

Regards, Tim

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to