-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 ** Proposal Text **
Choice 3: Training data for training of AI models is not to be considered "source code" in the context of DFSG. Instead the real source code in such a case is "Training Data Information" and the training data itself is an intermediate build artifact. AI models are compatible with DFSG only if they provide complete "Training Data Information". AI models whose reproduction from training data or from training data information is prohibitively expensive or is impractical are compatible with DFSG only if they provide ways to modify the AI model and create derivative works directly from the trained model. The meaning of "Data Information" is based on definitions and explanations from https://opensource.org/ai/open-source-ai-definition ** Rationale ** The problem of collection and distribution of training data sets can be fully avoided by going another step back - seeing the "training data information" as the *actual* source code and the "training data" itself only being an intermediate build artifact. While this might not guarantee a fully reproducible rebuild of a model (even if that could be a possibility in some cases by identifying the exact version of the source data with use of hashes), it does a step better - it makes it possible (if enough resources are invested) to create a new version of the model with new and updated data. And it does not put the onus on Debian to redistribute this intermediate data. The definition maintains all guidelines of DFSG intact, but add two clarifications: 1) Training data is not source code. There is a difference in copyright case law between source code and training data. It is really clear that a compiled binary is a derived work of the source code. However, there is no direct copyright relationship between training data and an AI model or its outputs. That is why it should be considered separately. The specifics may need to be adjusted based on future court orders. For example, it might be necessary to include protections against AI regurgitation. 2) An AI model that is both prohibitively expensive to reproduce and is not easily modifiable does not (de-facto) satisfy the DFSG derived works requirement. -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEFmwrqIlWRDzdY39G+mQ7ph0ievsFAmgaWqoACgkQ+mQ7ph0i evtd0Q/9FrxQqHQI94GBNQF3uA+BbcghJYq4ZSLxGdrhS6g2IQpZ3Vq+dMJ9WKrr 5Wbmct2u2vt2Mk36WFnuTQDkEv6Cx9QN/lMUfMhcnBVnt8hL1XjRCbQGCMqiUcRz /QFAGbhjuxwvLWPDAKs3AEWbv0nPTmacEzMVA7s8629ZnRq9sV9fzcnP0jqBBQq0 lvaeDJBiKgpmM3b/ENeyKopmuRroCpqpG2OTghAsMSa7JHqfibgqamHmFkeDaOJt 5HveKmcm9AV2PwVP6UZHpyDciCCPFkZSpor1V+02qhEZBtHKNxGNgAYb/Edxnsxh 1W7MRQrwi8alPXeFKYLKNbD1ZP7WUDjvEXVJF1ucmir0599us+soPjN9VFNkr58F 5ugoubQN+rcz989tPbSnUst6wSPkDgRlkjtaF+uPn6LCIFuvCt3GH+OxJlmYG/K+ 1C9Ea60WMkn38b6Yn9gW7WYq09hnP6kpPeXfmD68Ac0YxWKoj18FPD3WDwTc5/S5 Fp+LpJ3vd1PpcYfacA0a+l7H0Vc5K4woRjzCU4KTVeYpBZSe4hRuOn3igFx6Z53E cUwjoZqnCLU7SoiDP9xXSBTF3UBM/iTcrW33gBE3ujKyv+p2z74eUvrn302ZFA9G JlDoRdmTHqLlNncEA04FdJ6+VBNY6GZKGXK5r0vDMnQ26MMHWdU= =imT+ -----END PGP SIGNATURE----- -- Best regards, Aigars Mahinovs mailto:[email protected]

