Re: Policy on use of LLM tools and bug fixes

Christopher Albert via Gcc Fri, 17 Apr 2026 06:06:31 -0700

On 4/17/26 2:10 PM, Jonathan Wakely wrote:

On Fri, 17 Apr 2026 at 11:50, Christopher Albert via Gcc
<[email protected]> wrote:

On Thu, 16 Apr 2026 at 15:28, Richard Earnshaw (foss) via Gcc
<[email protected]> wrote:

On 14/03/2026 19:02, Jeffrey Law via Gcc wrote:


On 3/14/2026 12:59 PM, Jerry D via Gcc wrote:

Some of the various LLM services available appear to be getting very good at 
generating bug fixes. I realize that one must be careful as these tools can at 
times do things that may be superfluous to the actual fix. By superfluous I 
mean lines of code that are not relevant to the lines that fix it.

I saw some discussions of this subject for gcc somewhere and wanted to know if 
we have a specific policy established / documented somewhere regarding this.

The steering committee is trying to figure out a good policy right now.

Jeff

I notice that the Linux kernel recently adopted the following 
policy:https://github.com/torvalds/linux/blob/master/Documentation/process/coding-assistants.rst

Has there been any progress on GCC yet?

Carlos and I prepared a draft policy, but I believe the GCC steering
committee is also looking into it. The FSF are also working on
policies.

Our draft policy takes a similar position to the kernel one: LLMs
cannot do a DCO sign-off as their output is not copyrightable. The
correct trailer to use is Assisted-by and not Co-authored-by. But our
draft policy proposes *not* accepted AI-generated code, only allowing
the use of AI for assistance, idea generation, testing, but not
generating the actual code. That's because the legal status of
AI-generated code is unclear, is not copyrightable, and does not meet
the legal prerequisites for GCC contributions.

I would add one practical point from recent experience.
A substantial part of my own recent GCC contributions was only possible
because reviewers and maintainers engaged seriously with patches I
developed using AI tools under my direction. In at least some cases, it
would be fair to say that this went beyond AI as pure "idea generation":
the tools were part of the development workflow that let me produce,
iterate on, and validate fixes much more efficiently. The patches were
still submitted by me, reviewed by me, tested by me, and signed off by
me, with full responsibility on my side for every line.
In practice, this workflow was very successful. I do not think I could
have fixed so many bugs, at that quality and speed, without it. I would
therefore be cautious about a policy that is too strict. If GCC rules
out any patch where AI contributed more than idea generation, we may
lose an accountable workflow that has worked well here, and we risk
falling behind projects that take a more pragmatic line.
The legal picture also seems more nuanced to me than a simple rule of
"assistance allowed, generation forbidden":
*US. U.S. Copyright Office, Copyright and Artificial Intelligence,
Part 2: Copyrightability (Jan 2025): AI used to assist human
creativity does not by itself defeat protection; purely AI-generated
material is not protected; the analysis is case-specific and turns on
human authorship and control over the final expression; human
selection, arrangement, and modification of AI output may be
protected.
https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf
*EU. The CJEU standard is the "author's own intellectual creation",
that is, free and creative choices by a human author (Infopaq
C-5/08, Painer C-145/10, Cofemel C-683/17). The EU AI Act
(Reg. 2024/1689) regulates AI systems and transparency, not copyright
authorship.
https://eur-lex.europa.eu/eli/reg/2024/1689/oj
*UK. CDPA 1988 s.9(3) assigns authorship of computer-generated works
to "the person by whom the arrangements necessary for the creation of
the work are undertaken."
https://www.legislation.gov.uk/ukpga/1988/48/section/9
These regimes do not, in my view, support a simple categorical rule
that any code produced with substantial AI involvement must be excluded.
The more relevant question seems to be whether there is sufficient human
authorship and control over the final result.
I fully understand the need for caution on provenance, licensing, and
responsibility, and I agree that an LLM cannot itself sign a DCO. But
from my perspective, the decisive criterion should be that the human
contributor takes full responsibility for the submitted patch:
careful review, any necessary rewriting, testing, and sign-off,
rather than a blanket rule that excludes code whenever AI tools played
a substantial role somewhere in the development process.

How do you, as the human in the loop, attest that the LLM output is
not reproducing something copied almost verbatim from another project?
If you write it, you know you didn't copy it. If the LLM writes it,
you just have to trust that it isn't spitting out copyrighted
material. And it's been proven that they can and do spit out
copyrighted material.

So the concern isn't only that generated code *can't* be  copyrighted,
but that it might be an unlicensed reproduction of some other code
which already is copyrighted.


The risk exists and is manageable as far as I understand.

DCO sign-off is the project's mechanism for taking responsibility. WhenI sign off, I certify under DCO 1.1 that I have the right to submit thecode under the project's licence. That applies whether I typed it,pasted it, or produced it with an LLM. A human who pastes from anothercodebase without checking is the same problem, and the project handlesit by putting the legal burden on the signer.

On the empirical rate, GitHub's recitation study (https://github.blog/ai-and-ml/github-copilot/github-copilot-research-recitation/) analysed 453,780 Copilot suggestions and found 41 genuine verbatimreproductions of training data, about 0.009%, roughly one event per tenuser-weeks. Almost all were code "everybody quotes": boilerplate, commonheaders, standard idioms appearing in the training corpus hundreds oftimes. In usual legal systems, you cannot put a copyright on such codeor other short snippets, as it is not enough original intellectual work.

With more modern models and the highly specialized nature of GCC, Iwould expect that, if any, the LLM would reproduce code from GCC itselfor other compilers under FOSS licenses. In the worst case, if a partslips through, one would add an attribution header or rewrite that pieceif someone notices. The only FOSS compilers whose code could inprinciple be reproduced and would be license-incompatible with GCC areGPLv2-only projects such as legacy Open64. So I wouldn't expect anylegal risk there.

So in practice for the GCC project I see a very low risk, and as theperson who signs off the contribution I am willing to take it.

Re: Policy on use of LLM tools and bug fixes

Reply via email to