On Fri, 17 Apr 2026 at 14:05, Christopher Albert <[email protected]> wrote: > > On 4/17/26 2:10 PM, Jonathan Wakely wrote: > > On Fri, 17 Apr 2026 at 11:50, Christopher Albert via Gcc > <[email protected]> wrote: > > On Thu, 16 Apr 2026 at 15:28, Richard Earnshaw (foss) via Gcc > <[email protected]> wrote: > > On 14/03/2026 19:02, Jeffrey Law via Gcc wrote: > > On 3/14/2026 12:59 PM, Jerry D via Gcc wrote: > > Some of the various LLM services available appear to be getting very good at > generating bug fixes. I realize that one must be careful as these tools can > at times do things that may be superfluous to the actual fix. By superfluous > I mean lines of code that are not relevant to the lines that fix it. > > I saw some discussions of this subject for gcc somewhere and wanted to know > if we have a specific policy established / documented somewhere regarding > this. > > The steering committee is trying to figure out a good policy right now. > > Jeff > > I notice that the Linux kernel recently adopted the following policy: > https://github.com/torvalds/linux/blob/master/Documentation/process/coding-assistants.rst > > Has there been any progress on GCC yet? > > Carlos and I prepared a draft policy, but I believe the GCC steering > committee is also looking into it. The FSF are also working on > policies. > > Our draft policy takes a similar position to the kernel one: LLMs > cannot do a DCO sign-off as their output is not copyrightable. The > correct trailer to use is Assisted-by and not Co-authored-by. But our > draft policy proposes *not* accepted AI-generated code, only allowing > the use of AI for assistance, idea generation, testing, but not > generating the actual code. That's because the legal status of > AI-generated code is unclear, is not copyrightable, and does not meet > the legal prerequisites for GCC contributions. > > I would add one practical point from recent experience. > A substantial part of my own recent GCC contributions was only possible > because reviewers and maintainers engaged seriously with patches I > developed using AI tools under my direction. In at least some cases, it > would be fair to say that this went beyond AI as pure "idea generation": > the tools were part of the development workflow that let me produce, > iterate on, and validate fixes much more efficiently. The patches were > still submitted by me, reviewed by me, tested by me, and signed off by > me, with full responsibility on my side for every line. > In practice, this workflow was very successful. I do not think I could > have fixed so many bugs, at that quality and speed, without it. I would > therefore be cautious about a policy that is too strict. If GCC rules > out any patch where AI contributed more than idea generation, we may > lose an accountable workflow that has worked well here, and we risk > falling behind projects that take a more pragmatic line. > The legal picture also seems more nuanced to me than a simple rule of > "assistance allowed, generation forbidden": > *US. U.S. Copyright Office, Copyright and Artificial Intelligence, > Part 2: Copyrightability (Jan 2025): AI used to assist human > creativity does not by itself defeat protection; purely AI-generated > material is not protected; the analysis is case-specific and turns on > human authorship and control over the final expression; human > selection, arrangement, and modification of AI output may be > protected. > https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf > *EU. The CJEU standard is the "author's own intellectual creation", > that is, free and creative choices by a human author (Infopaq > C-5/08, Painer C-145/10, Cofemel C-683/17). The EU AI Act > (Reg. 2024/1689) regulates AI systems and transparency, not copyright > authorship. > https://eur-lex.europa.eu/eli/reg/2024/1689/oj > *UK. CDPA 1988 s.9(3) assigns authorship of computer-generated works > to "the person by whom the arrangements necessary for the creation of > the work are undertaken." > https://www.legislation.gov.uk/ukpga/1988/48/section/9 > These regimes do not, in my view, support a simple categorical rule > that any code produced with substantial AI involvement must be excluded. > The more relevant question seems to be whether there is sufficient human > authorship and control over the final result. > I fully understand the need for caution on provenance, licensing, and > responsibility, and I agree that an LLM cannot itself sign a DCO. But > from my perspective, the decisive criterion should be that the human > contributor takes full responsibility for the submitted patch: > careful review, any necessary rewriting, testing, and sign-off, > rather than a blanket rule that excludes code whenever AI tools played > a substantial role somewhere in the development process. > > How do you, as the human in the loop, attest that the LLM output is > not reproducing something copied almost verbatim from another project? > If you write it, you know you didn't copy it. If the LLM writes it, > you just have to trust that it isn't spitting out copyrighted > material. And it's been proven that they can and do spit out > copyrighted material. > > So the concern isn't only that generated code *can't* be copyrighted, > but that it might be an unlicensed reproduction of some other code > which already is copyrighted. > > The risk exists and is manageable as far as I understand. > > > DCO sign-off is the project's mechanism for taking responsibility. When I > sign off, I certify under DCO 1.1 that I have the right to submit the code > under the project's licence. That applies whether I typed it, pasted it, or > produced it with an LLM.
If you typed it, you know you typed it. If you pasted it from elsewhere, you know you did that. If the LLM copied it from elsewhere, how would you know? > A human who pastes from another codebase without checking is the same > problem, and the project handles it by putting the legal burden on the signer. Anybody who asserts that LLM-generated code they are contributing is not copied from the training data is making a claim they can't back up. > > > On the empirical rate, GitHub's recitation study ( > https://github.blog/ai-and-ml/github-copilot/github-copilot-research-recitation/ > ) analysed 453,780 Copilot suggestions and found 41 genuine verbatim > reproductions of training data, about 0.009%, roughly one event per ten > user-weeks. Almost all were code "everybody quotes": boilerplate, common > headers, standard idioms appearing in the training corpus hundreds of times. > In usual legal systems, you cannot put a copyright on such code or other > short snippets, as it is not enough original intellectual work. Yes, that the people selling you the tool tell you the tool is fine. They'll also tell you LLMs don't reproduce copyright books. https://arxiv.org/abs/2601.02671 suggests otherwise. > > > With more modern models and the highly specialized nature of GCC, I would > expect that, if any, the LLM would reproduce code from GCC itself or other > compilers under FOSS licenses. In the worst case, if a part slips through, > one would add an attribution header or rewrite that piece if someone notices. > The only FOSS compilers whose code could in principle be reproduced and would > be license-incompatible with GCC are GPLv2-only projects such as legacy > Open64. So I wouldn't expect any legal risk there. > > > So in practice for the GCC project I see a very low risk, and as the person > who signs off the contribution I am willing to take it. That doesn't mean the project has to be willing to take it.
