Re: Package Updates and additions should mentions if LLM is used in the software or not

Development of GNU Guix and the GNU System distribution. Tue, 12 May 2026 05:42:23 -0700

On 12/5/26 13:17, Nguyễn Gia Phong via Development of GNU Guix and theGNU System distribution. wrote:

On 2026-05-06 at 07:26-04:00, Greg Hogan wrote:

On Tue, May 5, 2026 at 5:45 PM pinoaffe <[email protected]> wrote:

And even if llm output is generally thought to be licensable, this
clearly cannot apply to any near-perfect copies of some part of its
training data that it may randomly emit, so incorporating llm output
into a GPL project would likely still be a legal risk


This is not happening in 2026. With old models and non-random
extraction, perhaps it can be done, but no one is demonstrating a
modern LLM returning "near-perfect copies of some part of its training
data" for any copyrightable unit of work.


I would like to see studies backing this claim.  Oracle published
the interim policy for OpenJDK just last month, not last year.
As for a demonstration try this classic prompt:

Complete the following: float Q_rsqrt

Here the user already starts with something directly from the'problematic' material. There is no essential difference between askingan LLM and any other tool like Google.


Codex even tells me what it is giving:

> I’m checking the directory contents before deciding whether this is
> just asking for the classic function body.

Codex realizes the request is for a "classic function body" and returnsthat.



In order to accidentally end up with problematic code, this needs to happen:

- The programmer unknowingly made a reference to problematic code, e.g.the programmer coincidentally selected the same variable names as JohnCarmack.

- There is no context for the LLM to figure out what the user means, soit has to guess the users wants something classic.

- The affected code is famous enough that the LLM decides that that iswhat the user wants.


- The code is indeed problematic, e.g. proprietary.

- The safeguards from the LLM do not flag this prompt as someone tryingto deliberately/accidentally get copyrighted code.

- The programmer ignores that the agent tells the user that this is'classic code'.

- The regurgitated code indeed does what the programmer does, which isagain totally coincidental, because we assume no intent.



That just does not happen all by accident.

Someone simply copying copyrighted code and lying about it, has a higherchance of happening than someone accidentally getting it from an LLM.

And note that this example would be fine for us, because the Quake 3code is GPL (assuming we attribute it). At least unlike Oracle, wecould probably incorporate this code in our software.

Every example of LLM's regurgitating copyrighted text starts with aprompt that is derived from that copyrighted text.


Hugo



Oh follow-up, I asked Codex:

> Me: Why do you label this "the classic function body"?

> Codex: [...] So with only float Q_rsqrt as the prompt and no files in
> the workspace, I inferred you wanted the canonical implementation
> associated with that name.

What else could it mean?  It is doing exactly what it is asked.

We need to get beyond such trivial examples if we want to resolve this.

Re: Package Updates and additions should mentions if LLM is used in the software or not

Reply via email to