On Sat, 10 May 2025, Aigars Mahinovs wrote: >An algorithm that only stores and produces an *average* value across a >wide set of inputs can not be any kind of compression.
It’s not “just” an average: as has been shown, substantial amounts of substantially unmodified “training data” can be extracted. >It is data mining. The copyright exception for text and data mining is only valid for uses that extract trends and things like that, not for generative use (and not for content with explicit opt-out, which those scrapers ignored). >then go up. If I run "wc" on a copyrighted work, the number of words >in the document is *not* a derived work from the original document. If you JPEG-compress a photo of the original document then uncompress it, it *is*. And, again, this has been shown to be substantially significantly for these models to be possible, therefore we need to act as if the output of such generation is derived from its inputs in the general case. There will always be outputs which aren’t, and inputs which don’t influence a subset of particular outputs, but the sum of its outputs is mechanicall derived from (most of) the sum of its inputs. bye, //mirabilos -- /⁀\ The UTF-8 Ribbon ╲ ╱ Campaign against ╳ HTML eMail! Also, ╱ ╲ header encryption!

