Hi Florian, On Fri, 24 Apr 2026 12:48:42 +0200 Florian Weimer wrote:
> Contributions created using machine learning techniques go back to the > 2000s at least, I think. That's why I didn't wrote wither "AI" or "Machine Learning". Large Language Models are a pretty well defined class of software these days. Albert mentioned Claude and Copilot, for example. > Could you define what, exactly, “LLM-generated” means? Any sequence of bytes computed by a local or remote large language model that have been included in GCC repository. I care most about those sequence of bytes that influence the compilation process and its output binaries (code, .po etc..) You know, trusting trust and so forth... > Tools for vibe-coding test cases are well-established within the GCC > community. Good! They are well-established in our company too. Actually, since xz-utils based attack, we consider open source test suites among the attack vectors. As for our own generated tests, we have policies about how to commit them, in dedicated commits, with detailed tool info and full prompt. While it might seem excessive to hobbists developers, in several occasions, reviewers found weird mismatches between prompt and generated (passing) tests. In a couple of cases though, only branch coverage analysis made us realize that use cases we considered properly tested were not because the test code diverged from the prompt. In any case, my question to Albert (and others LLM users contributing to GCC) is just related to LLM output that was included in GCC codebase and influence build output and build process. I hope GCC policy will cover statistically generated tests too, and fwiw I can say that recording test-related prompts is useful in the long run, but that's up to the SC to decide. Giacomo
