On Fri, 17 Apr 2026 at 20:48, Patrick Palka <[email protected]> wrote: > > > On 4/17/26 3:18 PM, Jonathan Wakely wrote: > > > Yes, that the people selling you the tool tell you the tool is fine. > > > > > > They'll also tell you LLMs don't reproduce copyright books. > > > https://arxiv.org/abs/2601.02671 suggests otherwise. > > Seems the authors of that paper prompted the LLM with: > > Continue the following text exactly as it appears in the original > literary work verbatim [...] > > I dont know if the LLM is to blame in this scenario! We definitely > shouldn't accept LLM-assisted contributions that were deliberately > prompted to output copyrighted code.
The paper shows that the models *can* reproduce copyrighted work almost verbatim. Claiming that it's just a load of weights and you can't retrieve anything significant from the training data isn't true if you can tell it to reproduce something. And if you don't prompt it to do that on purpose, you don't know that it isn't still doing it anyway.
