Re: Concerns/questions around Software Heritage Archive

Daniel Littlewood Mon, 18 Mar 2024 12:14:45 -0700

Hi Kaelyn,

The legal question is unsettled, and there is ongoing litigation by
(at least) Matthew Butterick in the US, since at least 2022. The
reasonable positions I'm aware of are:


1. An LLM (or, more precisely, the set of weights that define it) is
not a derivative work of its training data, for the purposes of
copyright, and thus the license is irrelevant.
2. Producing an LLM from training data is a transformative fair use,
and thus the license is irrelevant.
3. Neither 1 nor 2 holds, and LLMs constitute copyright infringement
on a profound scale (of both copyrighted and copylefted works).

The FSF and CC have both commissioned white papers on the impact of
such considerations for Free works. I don't recall seeing anything
particularly insightful in them. Probably a waste of time to discuss
it here.

Best wishes,
Dan

Re: Concerns/questions around Software Heritage Archive

Reply via email to