On 2026-02-20 20:09, Jonathan Dowland wrote:
> On Fri Feb 20, 2026 at 4:43 PM GMT, Christian Kastner wrote:
>> One, I don't think we have the standing to *on principle* decide that
>> all outputs are derivative, in the legal sense of the term. That will be
>> decided by courts and/or legislation.
> 
> We are taking a view, one way or another, ahead of any legislation, by
> our actions. I don't believe we need wait for legal precedent (although
> it would be wise to predict likely outcomes, where possible, and plan
> for the impact of them).
> 
> If someone cooked up a new software license, we would evaluate whether
> it was DFSG free and accept or reject corresponding software on that
> basis, without waiting for it to be tested in court.

I think we are on the same page here. A typo garbled my previous message
(I meant "rulings" instead of "rules"), but rulings so far have been
mostly in favor of the LLM side.

Now the following is a dangerous argument to make, because it sounds
like "everyone is doing it", but that's not at all the argument I'm
trying to make:

The fact that practically every large corporation is so enthusiastically
advertising just how much it's using LLMs for development work suggests
that either their otherwise hypersensitive legal departments have
collectively shut down, or that these departments concluded that they
don't see a meaningful risk.

>> There has to be some meaningful relationship between the inputs for
>> which authorship is being claimed, and the outputs.
> 
> The meaningful relationship is the LLM: it's not magic, it's a (very
> complex) mapping of input to output.

They are autoregressive models which produce output based on essentially
statistics computed from a massive amount of inputs, not one input.

Using my previous example, I would agree with the "mapping of input to
output" characterization if you could show an actual mapping from your
foo.c to my bar.c taking place. And I acknowledged that this would be a
problem ("reproductions of copyrighted materials"), and this type of
memorization has indeed been a problem in practice. But if you couldn't
show that mapping, then how could bar.c be derived from foo.c.

OTOH, in cases without reproduction, I dispute the mapping
characterization because these models do capture generic patterns as
well. For example, they are good at writing simple for-loops because
they've collected statistics on billions of them, not because any one of
them was some sort of original inspiration.

In relation to this, I'd also like to refer to problems #1 and #2 that I
presented in [1].

Best,
Christian

[1]: https://lists.debian.org/debian-project/2026/02/msg00102.html

Reply via email to