On Thu, May 08, 2025 at 05:38:52PM +0300, Stefano Zacchiroli wrote:
What I strongly suspect would happen, if proposal A wins (which I also consider quite likely) is that Debian maintainers of free software products that use trained ML models that lack DFSG-free training data, will have to go down the rabbit hole of patching those software to systematically download the models on first use. Or just give up on maintaining those packages, of course.
That seems like widespread failure to me, but I'm still hoping that someone who supports either Mo's or Thorsten's proposal will articulate a better vision. On Thu, May 08, 2025 at 09:42:25AM -0700, Russ Allbery wrote:
I don't understand why machine learning models are any different. Or, rather, I understand why they're different to people who truly believe they really are free software. That argument makes sense to me; I just don't agree with it. But I don't understand the argument if one agrees that models without training data are non-free.
I'm not sure that these are quite the right terms. This email itself is non-free software, but if Sam wants to train some kind of deep learning model on it and release the model, without training data, under the Expat license, I definitely would not refer to the model as non-free. Would I prefer that copyright law be abolished and there be no impediments to providing the training data as well? Of course I would. But, absent that, there would be no way for Sam to distribute the training data as free software. To free some non-free firmware, in theory, the copyright holders just need to be motivated enough to do it. To free Sam's hypothetical email corpus, you would have to convince every single email author, including the spammers, to relicense. One of them is more of a pipe dream than the other.
Maybe the answer is that they're just too useful to the distribution to not package regardless of our opinions about whether they're free software. User experience and free software principles *are* often in tension and it's fine for us to shift that balance, in my opinion. But I guess I would have expected us to do that via a mechanism similar to non-free-firmware if we wanted to make it easy for users to use software that is OSAID-approved but not DFSG-free, at least if we have a lot of it.
Maybe that is what we should be doing; I'm not sure.

