Aigars Mahinovs <[email protected]> writes: > If we take as a given that copyright does *not* survive the learning > process of a (sufficiently complex) AI system, then it is *not* necessary > that all training *data* for training a DFSG-free AI to also be DFSG-free. > It is however necessary that: > * software needed for inference (usage) of the AI model to be DFSG-free > * software needed for the training process of the AI model to be DFSG-free > * software needed to gather, assemble and process the training data to be > DFSG-free or the manual process for it to be documented
Without necessarily disagreeing with this, I want to highlight that licensing is only *one* of the considerations behind the DFSG and we shouldn't fixate only on it. The other question is whether the training data constitutes source code in the sense of DFSG 2. I think there's at least a prima facie case that it is: The final training model is quite clearly not the preferred form of modification, and anyone who wanted to retrain the model would normally prefer to start with the existing training data set (and then possibly augment or filter it). Historically, we have not done this analysis, and we've basically ignored this problem. I packaged gnubg for years and never included the training data and treated the model weights like they were the source code, and no one really noticed or complained. But I'm not sure that was a defensible position. It was just something I did by default without really thinking about it. Now that the topic has come up and I've had a chance to think about it properly, I'm not at all sure that was correct. DFSG 2 is an independent requirement. Even if the source code to a package is clearly DFSG-free, we still require that the source code be in main, not off somewhere else where we promise it exists, really (but which is not under our control). We have historically not applied that to the training data for models, and maybe that's correct, but the correctness of that position is certainly not obvious to me from the wording of the DFSG. -- Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

