On Fri, Apr 17, 2026 at 10:00 PM Jonathan Wakely via Gcc
<[email protected]> wrote:
>
> On Fri, 17 Apr 2026 at 20:48, Patrick Palka <[email protected]> wrote:
> >
> > > On 4/17/26 3:18 PM, Jonathan Wakely wrote:
> > > > Yes, that the people selling you the tool tell you the tool is fine.
> > > >
> > > > They'll also tell you LLMs don't reproduce copyright books.
> > > > https://arxiv.org/abs/2601.02671 suggests otherwise.
> >
> > Seems the authors of that paper prompted the LLM with:
> >
> >   Continue the following text exactly as it appears in the original
> >   literary work verbatim [...]
> >
> > I dont know if the LLM is to blame in this scenario!  We definitely
> > shouldn't accept LLM-assisted contributions that were deliberately
> > prompted to output copyrighted code.
>
> The paper shows that the models *can* reproduce copyrighted work
> almost verbatim. Claiming that it's just a load of weights and you
> can't retrieve anything significant from the training data isn't true
> if you can tell it to reproduce something. And if you don't prompt it
> to do that on purpose, you don't know that it isn't still doing it
> anyway.

But of course GCC code that compiles is unlikely to be a verbatim
copy of source code that has incompatible license.  So it's not about
verbatim copying but instead copying "concepts" which then blurrs
the line between derived works and even software patents?  Both are
a problem for human contributed code as well, dependent on whether
the proof is on the side of the coder or the possible legal opponent
(as of whether GCC code is derived works of somebody elses copyrighted code).
Whether "I used LLMs" is sufficient prima facie in that regard needs to
wait for legal precedence ...

Richard.

Reply via email to