Hi Kaelyn,

The legal question is unsettled, and there is ongoing litigation by
(at least) Matthew Butterick in the US, since at least 2022. The
reasonable positions I'm aware of are:

1. An LLM (or, more precisely, the set of weights that define it) is
not a derivative work of its training data, for the purposes of
copyright, and thus the license is irrelevant.
2. Producing an LLM from training data is a transformative fair use,
and thus the license is irrelevant.
3. Neither 1 nor 2 holds, and LLMs constitute copyright infringement
on a profound scale (of both copyrighted and copylefted works).

The FSF and CC have both commissioned white papers on the impact of
such considerations for Free works. I don't recall seeing anything
particularly insightful in them. Probably a waste of time to discuss
it here.

Best wishes,
Dan

Reply via email to