Hi Bruno, Pavel
Thanks for explaining your setup and showing their results.
I think for some kind of decision we need some kind of reproducibility (using the loosely here as results from LLMs rarely reproduce exactly).
> The fact that your setup did not flag this commit as a regression shows > that Pavel's setup is more useful. Absolutely. That's why I am very curious about the setup.Another experiment with Claude code (closed source agent) connected to the same local LLM as before found the issue in commit 17dc60e624cd6fc3491f9cb002f760d60e66ce8b:
"Important caveat: The original commit 17dc60e624 introduced a bug in mbrtowc.c — it replaced MBRTOWC_EMPTY_INPUT_BUG with MBRTOC32_EMPTY_INPUT_BUG (wrong macro name) and added _GL_SMALL_WCHAR_T (mbrtoc32-specific). This was fixed by follow-up commit 2ca51a77e6. Both commits should be evaluated together."
It produced a longish result/summary explaining the issue in details.What makes Claude code different from the pi agent (OSS) is it's very verbose system prompt, which alone is >20.000 tokens. I did not use a skill this time. Claude took 15mins vs. pi taking 3mins in the previous experiment.
This tells me, that we have to craft a proper system prompt and/or a review skill to get comparable results from an oss agent + open weight model.
Do we already know whether we tend to- best results using closed source, closed weights, energy intensive, costly
or- best effort using open source, open weights, 100x (1000x?) less energy intensive, minimal costs
?Does it make sense to work on a gnulib specific AGENTS.md as well as on a system prompt that can be used for reviews?
15 lines of suggestions about a #if, without noticing that this #if condition is in fact buggy (incomplete): It needs to be enabled on all platforms which test pwc or n. Including the cases MBRTOWC_NUL_RETVAL_BUG and MBRTOWC_IN_C_LOCALE_MAYBE_EILSEQ. I'm fixing that through the patch below.
Thank you, Bruno! Regards, Tim
OpenPGP_signature.asc
Description: OpenPGP digital signature
