On Fri, Feb 20, 2026 at 01:39:10PM +0000, Jonathan Dowland wrote:
> An LLM which was solely trained on a corpus of free software with
> intra-compatible licensing
That's not really going to be practical; in practice an LLM is trained
a very large corpus of information, including things that are not
code, so that it can understand a plain english text prompt. This
includes, but is not excluded to, mailing list archives, where the
software licensing of code fragments is not necessarily going to be
clear.
I would also suggest that we are holding LLM's to a much higher
standard that we are for human beings. For example, a student might
have spent time learning how to code as part of their university
education by reading and studying books such as Robert Sedgewick's
Algorithms. The examples on how to write an insertion sort, or a heap
sort, are not necessarily going to be licensed under an open source
license. Yet we don't consider said human being to be hopelessly
tainted because they happen to have been trained on a text book which
contained potentially non-free software.
Or consider a human being with decades of experience who in the past,
may have had access to OS sources from, say, Solaris, OSF/1, AIX,
Irix, and DYNIX as part of their previous employment. That human
being has no doubt learned an awful lot about how systems work from
their exposure to those systems and how they were implemented. That
could arguably considered "training". Are they therefore inelibile to
contribute to Debian? I would hope not!
Cheers,
- Ted