On Fri, 17 Apr 2026, Christopher Albert via Gcc wrote: > On 4/17/26 3:18 PM, Jonathan Wakely wrote: > > On Fri, 17 Apr 2026 at 14:05, Christopher Albert <[email protected]> wrote: > > > On 4/17/26 2:10 PM, Jonathan Wakely wrote: > > > > > > On Fri, 17 Apr 2026 at 11:50, Christopher Albert via Gcc > > > <[email protected]> wrote: > > > > > > On Thu, 16 Apr 2026 at 15:28, Richard Earnshaw (foss) via Gcc > > > <[email protected]> wrote: > > > > > > On 14/03/2026 19:02, Jeffrey Law via Gcc wrote: > > > > > > On 3/14/2026 12:59 PM, Jerry D via Gcc wrote: > > > > > > Some of the various LLM services available appear to be getting very good > > > at generating bug fixes. I realize that one must be careful as these tools > > > can at times do things that may be superfluous to the actual fix. By > > > superfluous I mean lines of code that are not relevant to the lines that > > > fix it. > > > > > > I saw some discussions of this subject for gcc somewhere and wanted to > > > know if we have a specific policy established / documented somewhere > > > regarding this. > > > > > > The steering committee is trying to figure out a good policy right now. > > > > > > Jeff > > > > > > I notice that the Linux kernel recently adopted the following policy: > > > https://github.com/torvalds/linux/blob/master/Documentation/process/coding-assistants.rst > > > > > > Has there been any progress on GCC yet? > > > > > > Carlos and I prepared a draft policy, but I believe the GCC steering > > > committee is also looking into it. The FSF are also working on > > > policies. > > > > > > Our draft policy takes a similar position to the kernel one: LLMs > > > cannot do a DCO sign-off as their output is not copyrightable. The > > > correct trailer to use is Assisted-by and not Co-authored-by. But our > > > draft policy proposes *not* accepted AI-generated code, only allowing > > > the use of AI for assistance, idea generation, testing, but not > > > generating the actual code. That's because the legal status of > > > AI-generated code is unclear, is not copyrightable, and does not meet > > > the legal prerequisites for GCC contributions. > > > > > > I would add one practical point from recent experience. > > > A substantial part of my own recent GCC contributions was only possible > > > because reviewers and maintainers engaged seriously with patches I > > > developed using AI tools under my direction. In at least some cases, it > > > would be fair to say that this went beyond AI as pure "idea generation": > > > the tools were part of the development workflow that let me produce, > > > iterate on, and validate fixes much more efficiently. The patches were > > > still submitted by me, reviewed by me, tested by me, and signed off by > > > me, with full responsibility on my side for every line. > > > In practice, this workflow was very successful. I do not think I could > > > have fixed so many bugs, at that quality and speed, without it. I would > > > therefore be cautious about a policy that is too strict. If GCC rules > > > out any patch where AI contributed more than idea generation, we may > > > lose an accountable workflow that has worked well here, and we risk > > > falling behind projects that take a more pragmatic line. > > > The legal picture also seems more nuanced to me than a simple rule of > > > "assistance allowed, generation forbidden": > > > *US. U.S. Copyright Office, Copyright and Artificial Intelligence, > > > Part 2: Copyrightability (Jan 2025): AI used to assist human > > > creativity does not by itself defeat protection; purely AI-generated > > > material is not protected; the analysis is case-specific and turns on > > > human authorship and control over the final expression; human > > > selection, arrangement, and modification of AI output may be > > > protected. > > > https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf > > > *EU. The CJEU standard is the "author's own intellectual creation", > > > that is, free and creative choices by a human author (Infopaq > > > C-5/08, Painer C-145/10, Cofemel C-683/17). The EU AI Act > > > (Reg. 2024/1689) regulates AI systems and transparency, not copyright > > > authorship. > > > https://eur-lex.europa.eu/eli/reg/2024/1689/oj > > > *UK. CDPA 1988 s.9(3) assigns authorship of computer-generated works > > > to "the person by whom the arrangements necessary for the creation of > > > the work are undertaken." > > > https://www.legislation.gov.uk/ukpga/1988/48/section/9 > > > These regimes do not, in my view, support a simple categorical rule > > > that any code produced with substantial AI involvement must be excluded. > > > The more relevant question seems to be whether there is sufficient human > > > authorship and control over the final result. > > > I fully understand the need for caution on provenance, licensing, and > > > responsibility, and I agree that an LLM cannot itself sign a DCO. But > > > from my perspective, the decisive criterion should be that the human > > > contributor takes full responsibility for the submitted patch: > > > careful review, any necessary rewriting, testing, and sign-off, > > > rather than a blanket rule that excludes code whenever AI tools played > > > a substantial role somewhere in the development process. > > > > > > How do you, as the human in the loop, attest that the LLM output is > > > not reproducing something copied almost verbatim from another project? > > > If you write it, you know you didn't copy it. If the LLM writes it, > > > you just have to trust that it isn't spitting out copyrighted > > > material. And it's been proven that they can and do spit out > > > copyrighted material. > > > > > > So the concern isn't only that generated code *can't* be copyrighted, > > > but that it might be an unlicensed reproduction of some other code > > > which already is copyrighted. > > > > > > The risk exists and is manageable as far as I understand. > > > > > > > > > DCO sign-off is the project's mechanism for taking responsibility. When I > > > sign off, I certify under DCO 1.1 that I have the right to submit the code > > > under the project's licence. That applies whether I typed it, pasted it, > > > or produced it with an LLM. > > If you typed it, you know you typed it. If you pasted it from > > elsewhere, you know you did that. > > > > If the LLM copied it from elsewhere, how would you know? > > > > > A human who pastes from another codebase without checking is the same > > > problem, and the project handles it by putting the legal burden on the > > > signer. > > Anybody who asserts that LLM-generated code they are contributing is > > not copied from the training data is making a claim they can't back > > up. > > > > > > > > On the empirical rate, GitHub's recitation study ( > > > https://github.blog/ai-and-ml/github-copilot/github-copilot-research-recitation/ > > > ) analysed 453,780 Copilot suggestions and found 41 genuine verbatim > > > reproductions of training data, about 0.009%, roughly one event per ten > > > user-weeks. Almost all were code "everybody quotes": boilerplate, common > > > headers, standard idioms appearing in the training corpus hundreds of > > > times. In usual legal systems, you cannot put a copyright on such code or > > > other short snippets, as it is not enough original intellectual work. > > Yes, that the people selling you the tool tell you the tool is fine. > > > > They'll also tell you LLMs don't reproduce copyright books. > > https://arxiv.org/abs/2601.02671 suggests otherwise.
Seems the authors of that paper prompted the LLM with: Continue the following text exactly as it appears in the original literary work verbatim [...] I dont know if the LLM is to blame in this scenario! We definitely shouldn't accept LLM-assisted contributions that were deliberately prompted to output copyrighted code. > > > > > > > > With more modern models and the highly specialized nature of GCC, I would > > > expect that, if any, the LLM would reproduce code from GCC itself or other > > > compilers under FOSS licenses. In the worst case, if a part slips through, > > > one would add an attribution header or rewrite that piece if someone > > > notices. The only FOSS compilers whose code could in principle be > > > reproduced and would be license-incompatible with GCC are GPLv2-only > > > projects such as legacy Open64. So I wouldn't expect any legal risk there. > > > > > > > > > So in practice for the GCC project I see a very low risk, and as the > > > person who signs off the contribution I am willing to take it. > > That doesn't mean the project has to be willing to take it. > > Thank you, Jonathan, I get your point and I have made mine. I think this sums > up the whole conflict area in which the project has to navigate. > > > Best > Chris > >
