Hi Kaelyn, The legal question is unsettled, and there is ongoing litigation by (at least) Matthew Butterick in the US, since at least 2022. The reasonable positions I'm aware of are:
1. An LLM (or, more precisely, the set of weights that define it) is not a derivative work of its training data, for the purposes of copyright, and thus the license is irrelevant. 2. Producing an LLM from training data is a transformative fair use, and thus the license is irrelevant. 3. Neither 1 nor 2 holds, and LLMs constitute copyright infringement on a profound scale (of both copyrighted and copylefted works). The FSF and CC have both commissioned white papers on the impact of such considerations for Free works. I don't recall seeing anything particularly insightful in them. Probably a waste of time to discuss it here. Best wishes, Dan