Hi Tim, > Another experiment with Claude code (closed source agent) connected to > the same local LLM as before found the issue in commit > 17dc60e624cd6fc3491f9cb002f760d60e66ce8b: > > "Important caveat: The original commit 17dc60e624 introduced a bug in > mbrtowc.c — it replaced MBRTOWC_EMPTY_INPUT_BUG with > MBRTOC32_EMPTY_INPUT_BUG (wrong macro name) and added _GL_SMALL_WCHAR_T > (mbrtoc32-specific). This was fixed by follow-up commit 2ca51a77e6. Both > commits should be evaluated together."
Well, if it knows about the follow-up commit, we don't know whether it found the regression independently of that. > It produced a longish result/summary explaining the issue in details. Too much details are not advisable. Because what is cheaper: Reviewing a 20-lines patch or reading a 50-lines analysis report? > This tells me, that we have to craft a proper system prompt and/or a > review skill to get comparable results from an oss agent + open weight > model. That's quite likely, yes. > Do we already know whether we tend to > - best results using closed source, closed weights, energy intensive, > costly > or > - best effort using open source, open weights, 100x (1000x?) less > energy intensive, minimal costs > ? In general, we should strive for the locally running open-weights models. If it's not locally running, it's SaaS, and what people report is that the quality slowly gets worse over time, until a new version of the LLM is released, at which moment the quality rises again - but with unpredictable effects on the particular workflow. Occasionally it happens that the closed-source alternative is more usable than the open-source one. For instance, I did not get a good experience with GitLab CI (open-source), because I could not find out how to store large log files in the case of failed builds. Whereas in GitHub the handling of the log files is not perfect either, but at least reasonable. But here, in the LLM space, the situation is different: If you stick to an SaaS model, you are forced to update the prompt or AGENTS.md file every two months. Which may be unreasonably costly in the long run. Bruno
