On Fri, 2022-01-14 at 15:35 +0100, Gard Spreemann wrote: > > I understand how you reach these conclusions, both from the POV of > hardware driver non-freedom and from the POV of the toxic candy > problem > of trained models. And while I agree with your conclusions, I do > worry > about the prospect of the lines blurring.
Indeed. But I eventually figured out that "lazy evaluation" on this problem is the most realistic solution for distribution developers. I'm not worried about it. See the reason below. > > It's not unreasonable to expect that AI models become standard > components of certain classes of software relatively soon. Nomatter [...] > What do we do if/when an image compression scheme involving a deep > learning model becomes popular? What do we do if/when every new FOSS > game ships with an RL agent that takes 80 GPU-weeks of training to > reproduce (and upstream supports nvidia only)? When every new text > editor comes with an autocompleter based on some generative model > that > upstream trained on an unclearly licensed scraping of a gazillion > webpages? > Indeed. Deep Learning has been demonstrated effective in video compression as well. However, research projects are not entering Debian. Only those implementations for industrial standard enter our archive. Only when standards like H.267 (imagined) really introduces deep learning as a part of the core algorithm, should we worry about the blurred borderline. However, even if that happened eventually, upstreams such as videolan and ffmpeg will have to think about GPL interpretation before we think about it. There is already an historical example from ffmpeg where pre-trained convolution kernels (in header file) are excluded from the GPL source code. And I bet even the ISO standard group has to think about the potential license/legal issues before introducing that. An RL agent that takes 80 GPU-weeks is also highly likely to require a powerful GPU for inference when we play such game. I play lots of games and what kind of open source game has reached that level of being so GPU-demanding? Before that comes true for free software games, they will first appear on commercial titles, ahead of free software games by decades. Generative model for code completion is already a widely known problem, such as Github's codepilot. They are fancy and useful but before we really think about the blurred borderline, we have already seen how controversy it was. Let's step back a little bit. When what you said all comes true, there will be some way for the end users to install them onto the system. A relevant example is vscode. It is a prevalent editor, being fond by a large user group across all systems. vscode's being absent from official repository is not stopping the upstream from distributing their own .deb packages. I understand how tricky it is to package in our archive. I believe the same thing will happen for new fancy AI tools (e.g., the face authentication for linux tool already has its own .deb package). Let me quote a word from a fellow developer: "In Debian we should stop from chasing rabbits." To me, "lazy evaluation" on these problems is seemlingly the best strategy. Based on Debian's role in this ecosystem, thinking about serious issues before our upstream does destines to make negligible technical progress. When we really have to execute those "lazy evaluation", we are not unprepared since the community is already aware of the precautions and warnings.