Re: Can a project accept LLMs' code contribution and remain free software?

Ekaitz Zarraga Sat, 21 Feb 2026 11:58:34 -0800

Hi,

On 2026-02-21 20:14, Craig Brozefsky wrote:


Ekaitz Zarraga <[email protected]> writes:

The process is different from the perspective of the person who writes
the code: they may not be aware of the copyright violations, the
hallucinations, etc the LLM will produce. From our side it doesn't
really matter much.


I largely agree with your line of argument regarding detection of LLM
usage, quality, and fit.

However, I would suggest that there are at least two categorical legal
differences from the project's perspective, which are independent of
observe-ability:

- LLM generated code is not copyrightable in all jurisdictions (eg. US).
   This is a risk to the project's ability to enforce its copyleft
   license terms on the aggregate work.

- There is in effect dual authorship.  One of the authors is
   categorically unaware of its infringement -- and is literally a lossy
   recall mechanism trained on copyrighted code.  This brings a
   considerably higher risk of infringement, which is *independent* of
   our ability to detect it. and is the *real* measure of risk of
   infringement by the project.

Here I'm not sure how the thing works but I believe when people sends uscode they sign it as theirs, so the ones that are infringing anything isthem.

If the would steal it from the internet (say stack overflow) the issueis similar. They would probably forget where they took it from and thatpiece of code could also have some copyright. The fact that there's anopaque machine in the middle makes this issue worse, but it's somethingwe already had but we didn't pay much attention to.


We assumed good faith, maybe we shouldn't have.

A different story is if we actually want as a community to share our
*opinion* about the usage of LLMs, or if we want to ask contributors
to say if their contribution was made using LLMs just for adjusting
the review criteria in those contributions (they could still lie,
though).


I think it's reasonable, and prudent, to ask contributors to correctly
present the authorship of the code, and it's copyright-ability.  If
there is a LLM involved, then I think we either may need to ask the
coauthor to perform a review for infringement, or we may reject it
outright.  Such a policy could mitigate the increased risk of
infringement, as well as our copyright-ability, which is the foundation
of our ability to assert copyleft license terms. [1]


The authorship and copyrightability we already ask for, don't we?

Isn't that what we, the committers, sign for? (I'm talking about signingoff the commit, which is not required in our own commits)

Maybe I misunderstood my mission! But I'd say we sign to confirm that wewere given the permission to add that piece of code to the project, thusaccepting the terms and implying the one that sent the patch is legallycapable to take that decision over the changes proposed.

Maybe we can just state it more clearly in the docs explicitlymentioning one can only contribute code they own or they have the rightsover.

This would in practice ban LLM generated code that is just copy-pasted,but also help in other problematic cases we were overlooking (peoplesharing code they wrote in company time, or written by their employees,or copied from somewhere else...).

Again, I think we are already doing it, so it wouldn't change much. I'mnot sure so, please, correct me if I'm wrong here.

I appreciate the collegiate and measured tone of the discussion on this
topic.


I agree.

Re: Can a project accept LLMs' code contribution and remain free software?

Reply via email to