Re: Policy on use of LLM tools and bug fixes

Patrick Palka via Gcc Fri, 17 Apr 2026 12:48:27 -0700

On Fri, 17 Apr 2026, Christopher Albert via Gcc wrote:

> On 4/17/26 3:18 PM, Jonathan Wakely wrote:
> > On Fri, 17 Apr 2026 at 14:05, Christopher Albert <[email protected]> wrote:
> > > On 4/17/26 2:10 PM, Jonathan Wakely wrote:
> > > 
> > > On Fri, 17 Apr 2026 at 11:50, Christopher Albert via Gcc
> > > <[email protected]> wrote:
> > > 
> > > On Thu, 16 Apr 2026 at 15:28, Richard Earnshaw (foss) via Gcc
> > > <[email protected]> wrote:
> > > 
> > > On 14/03/2026 19:02, Jeffrey Law via Gcc wrote:
> > > 
> > > On 3/14/2026 12:59 PM, Jerry D via Gcc wrote:
> > > 
> > > Some of the various LLM services available appear to be getting very good
> > > at generating bug fixes. I realize that one must be careful as these tools
> > > can at times do things that may be superfluous to the actual fix. By
> > > superfluous I mean lines of code that are not relevant to the lines that
> > > fix it.
> > > 
> > > I saw some discussions of this subject for gcc somewhere and wanted to
> > > know if we have a specific policy established / documented somewhere
> > > regarding this.
> > > 
> > > The steering committee is trying to figure out a good policy right now.
> > > 
> > > Jeff
> > > 
> > > I notice that the Linux kernel recently adopted the following policy:
> > > https://github.com/torvalds/linux/blob/master/Documentation/process/coding-assistants.rst
> > > 
> > > Has there been any progress on GCC yet?
> > > 
> > > Carlos and I prepared a draft policy, but I believe the GCC steering
> > > committee is also looking into it. The FSF are also working on
> > > policies.
> > > 
> > > Our draft policy takes a similar position to the kernel one: LLMs
> > > cannot do a DCO sign-off as their output is not copyrightable. The
> > > correct trailer to use is Assisted-by and not Co-authored-by. But our
> > > draft policy proposes *not* accepted AI-generated code, only allowing
> > > the use of AI for assistance, idea generation, testing, but not
> > > generating the actual code. That's because the legal status of
> > > AI-generated code is unclear, is not copyrightable, and does not meet
> > > the legal prerequisites for GCC contributions.
> > > 
> > > I would add one practical point from recent experience.
> > > A substantial part of my own recent GCC contributions was only possible
> > > because reviewers and maintainers engaged seriously with patches I
> > > developed using AI tools under my direction. In at least some cases, it
> > > would be fair to say that this went beyond AI as pure "idea generation":
> > > the tools were part of the development workflow that let me produce,
> > > iterate on, and validate fixes much more efficiently. The patches were
> > > still submitted by me, reviewed by me, tested by me, and signed off by
> > > me, with full responsibility on my side for every line.
> > > In practice, this workflow was very successful. I do not think I could
> > > have fixed so many bugs, at that quality and speed, without it. I would
> > > therefore be cautious about a policy that is too strict. If GCC rules
> > > out any patch where AI contributed more than idea generation, we may
> > > lose an accountable workflow that has worked well here, and we risk
> > > falling behind projects that take a more pragmatic line.
> > > The legal picture also seems more nuanced to me than a simple rule of
> > > "assistance allowed, generation forbidden":
> > > *US. U.S. Copyright Office, Copyright and Artificial Intelligence,
> > > Part 2: Copyrightability (Jan 2025): AI used to assist human
> > > creativity does not by itself defeat protection; purely AI-generated
> > > material is not protected; the analysis is case-specific and turns on
> > > human authorship and control over the final expression; human
> > > selection, arrangement, and modification of AI output may be
> > > protected.
> > > https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf
> > > *EU. The CJEU standard is the "author's own intellectual creation",
> > > that is, free and creative choices by a human author (Infopaq
> > > C-5/08, Painer C-145/10, Cofemel C-683/17). The EU AI Act
> > > (Reg. 2024/1689) regulates AI systems and transparency, not copyright
> > > authorship.
> > > https://eur-lex.europa.eu/eli/reg/2024/1689/oj
> > > *UK. CDPA 1988 s.9(3) assigns authorship of computer-generated works
> > > to "the person by whom the arrangements necessary for the creation of
> > > the work are undertaken."
> > > https://www.legislation.gov.uk/ukpga/1988/48/section/9
> > > These regimes do not, in my view, support a simple categorical rule
> > > that any code produced with substantial AI involvement must be excluded.
> > > The more relevant question seems to be whether there is sufficient human
> > > authorship and control over the final result.
> > > I fully understand the need for caution on provenance, licensing, and
> > > responsibility, and I agree that an LLM cannot itself sign a DCO. But
> > > from my perspective, the decisive criterion should be that the human
> > > contributor takes full responsibility for the submitted patch:
> > > careful review, any necessary rewriting, testing, and sign-off,
> > > rather than a blanket rule that excludes code whenever AI tools played
> > > a substantial role somewhere in the development process.
> > > 
> > > How do you, as the human in the loop, attest that the LLM output is
> > > not reproducing something copied almost verbatim from another project?
> > > If you write it, you know you didn't copy it. If the LLM writes it,
> > > you just have to trust that it isn't spitting out copyrighted
> > > material. And it's been proven that they can and do spit out
> > > copyrighted material.
> > > 
> > > So the concern isn't only that generated code *can't* be  copyrighted,
> > > but that it might be an unlicensed reproduction of some other code
> > > which already is copyrighted.
> > > 
> > > The risk exists and is manageable as far as I understand.
> > > 
> > > 
> > > DCO sign-off is the project's mechanism for taking responsibility. When I
> > > sign off, I certify under DCO 1.1 that I have the right to submit the code
> > > under the project's licence. That applies whether I typed it, pasted it,
> > > or produced it with an LLM.
> > If you typed it, you know you typed it. If you pasted it from
> > elsewhere, you know you did that.
> > 
> > If the LLM copied it from elsewhere, how would you know?
> > 
> > > A human who pastes from another codebase without checking is the same
> > > problem, and the project handles it by putting the legal burden on the
> > > signer.
> > Anybody who asserts that LLM-generated code they are contributing is
> > not copied from the training data is making a claim they can't back
> > up.
> > 
> > > 
> > > On the empirical rate, GitHub's recitation study (
> > > https://github.blog/ai-and-ml/github-copilot/github-copilot-research-recitation/
> > > ) analysed 453,780 Copilot suggestions and found 41 genuine verbatim
> > > reproductions of training data, about 0.009%, roughly one event per ten
> > > user-weeks. Almost all were code "everybody quotes": boilerplate, common
> > > headers, standard idioms appearing in the training corpus hundreds of
> > > times. In usual legal systems, you cannot put a copyright on such code or
> > > other short snippets, as it is not enough original intellectual work.
> > Yes, that the people selling you the tool tell you the tool is fine.
> > 
> > They'll also tell you LLMs don't reproduce copyright books.
> > https://arxiv.org/abs/2601.02671 suggests otherwise.


Seems the authors of that paper prompted the LLM with:

  Continue the following text exactly as it appears in the original
  literary work verbatim [...]

I dont know if the LLM is to blame in this scenario!  We definitely
shouldn't accept LLM-assisted contributions that were deliberately
prompted to output copyrighted code.

> > 
> > > 
> > > With more modern models and the highly specialized nature of GCC, I would
> > > expect that, if any, the LLM would reproduce code from GCC itself or other
> > > compilers under FOSS licenses. In the worst case, if a part slips through,
> > > one would add an attribution header or rewrite that piece if someone
> > > notices. The only FOSS compilers whose code could in principle be
> > > reproduced and would be license-incompatible with GCC are GPLv2-only
> > > projects such as legacy Open64. So I wouldn't expect any legal risk there.
> > > 
> > > 
> > > So in practice for the GCC project I see a very low risk, and as the
> > > person who signs off the contribution I am willing to take it.
> > That doesn't mean the project has to be willing to take it.
> 
> Thank you, Jonathan, I get your point and I have made mine. I think this sums
> up the whole conflict area in which the project has to navigate.
> 
> 
> Best
> Chris
> 
>

Re: Policy on use of LLM tools and bug fixes

Reply via email to