Hi, On Sun, 2025-05-04 at 15:54 +0200, Matthias Urlichs wrote: > On 04.05.25 15:44, Ansgar 🙀 wrote: >   > > What is not reproducible (in the reproducible build sense Debian uses) > > about, say, the Tesseract OCR models? > My point is that reproducing a model requires input data, which requires us > to distribute said data, which requires them to be of suitable copyright.
Ah, you mean in the sense of a from-scratch rebuilding of statistical data including the possibility to do different analysis? Debian doesn't require all data for a from-scratch reimplementation for packages to be available though. It would also run in many problems as relevant documents (RFCs, ISO standards, design documents, publications, ...) or cloned originals (UNIX or Windows APIs, games, ...) are often non-free. This has so far also been the case for statistical data in Debian, such as simple aggregates such as the number of packages in Debian, which might be included in Debian without also including the entire Debian archive as source, data about word or character frequencies in natural language texts, and so on. I guess proponents of the original GR would also find this problematic? Ansgar

