Hi,

I think we really need to not make rules on "AI" itself (which isn't
something clearly defined) or LLMs (still not clearly defined enough)
but rather decide what we really want and stand for. 

So we could cite specific usage of LLMs (with specific LLMs) as an
example or clarification of things that we refuse or rules that
*already exist* or that we create or modify.

Why
---

The danger of going directly against AI and not be against its
negatives consequences directly is that:

- It would increase confrontation because the rules are not clear
  anymore.

  Nonfree software is already forbidden. As an example, allowing any
  code generated by LLMs under the limit of 15 lines would allow
  contributors to infringe rules we already have and put the project at
  risk (why at the end).

  In addition people could argue that some "AI" systems are good but
  if we instead insist on the rules we already have, this argument
  becomes irrelevant (see below in the section about rules).

Having good rules that are precise enough to do what we want is
extremely important I think because otherwise we risk doing the exact
opposite of what we want (this will be more clear below).

Rules that already do exist and possible additional rules
---------------------------------------------------------

We already have very clear rules that are either implicit (because they
are extremely obvious) or explicit, for instance:

- The patches people provide to Guix are supposed to be under a
  free license and legal in several jurisdictions.

  Linux has a DCO for that so we don't already have something similar,
  we probably need to get some GNU DCO or Guix DCO, or reuses the Linux
  one to make sure that the Guix contributor can be held responsible in
  cases of (legal) issues.

- The same applies for packages the substitute servers do ship.

  If some generated code has unknown copyright status (because it's
  generated by LLM or because we don't know if the files are legal,
  like in the case of the files we remove in the sdcc package), I don't
  think that arguing for keeping software illegal under copyright law
  is going to work here.

  The solution has always been to remove these files like that (example
  taken from sdcc):

> (snippet #~(begin
>          ;; Remove non-free source files.
>          (delete-file-recursively "device/non-free")))

  And even if free software has not always respected all the laws in
  all jurisdictions, and probably never will (free software stands
  against DRMs for instance) it usually respected copyright laws pretty
  well, and it was badly damaged by the SCO vs IBM lawsuit (because the
  collateral damages were huge for the FLOSS community at large).
  This is why DCO and similar process were adopted by free software.

  Another example was the discussions about the case of the ZFS kernel
  module that cannot be redistributed once it is compiled and Guix has
  decided to not ship that module in a compiled form (we lacked any
  analysis in the case of not-compiled source code).

- Free software LLMs probably exist (like kaldi) but then it also raises
  the question of the cost for Guix.

  If in the future the cost is small enough, Guix would therefor have
  to train the models like it compiles software. 

  This would also need to be made reproducible (to have the model be
  functionally equivalent if it is retrained, because even
  non-reproductible software is functionally equivalent when
  recompiled), to be understandable by humans, etc.

  For instance Guix is responsible for the security, it has to take
  decisions on maintenance, we need to ensure that packages stay
  free, etc, so we need to be able to understand the software we
  package.

  All that is impossible if we have obfuscated source code for instance
  (and in the case of obfuscated source, it's pretty clear that it's
  not considered as source code as it is not the preferred form of
  modification).

- The FSDG already provides ways to exclude third party repositories
  that contain nonfree software (like models under nonfree licenses).

  Many of these slipped in and I think we should consider that as a
  bug and work toward a resolution, step by step, with the
  (limited) resources we have.

> This proposal takes a clear stance that not everyone may agree with.
> This could lead to fragmentation within the Guix community, or within
> the free software community.

I think it's the other way around: violating all the rules above would
lead to fragmentation, infightings, etc.

After all Guix is free software and its meant to package free software,
so I don't see why, just because something like an LLM looks powerful,
we should compromise our principles and stop curating packages to make
sure that they do respect users's freedom, that makes no sense.

Even the FSF is going in that direction (reference: the FSF talks on
LLMs etc 2 years ago at the FOSDEM).

Many other distributions do really have to take stance against AI
precisely because they also want to package nonfree software, so many
of the rules above don't apply to them.

In our case I think we need an extra rule here: we should be able to
refuse packaging software that puts the Guix contributors and/or users
at risk of infighting. An example is software that is maintained by
people that do oppress its contributors (an example here is probably
Xlibre and we discussed that on the mailing list already).

In that discussion, if I recall well, Efraim pointed out that Debian
has a rule like that and I think that it makes sense to adopt a similar
rule. This would allow to not package anything that look like "AI".

At the end nothing from the "AI" look particularly special: the rules
we have are good and they are so good that they mostly apply to "AI"
(we just need a DCO and to be able to refuse to package controversial
software to not divide the community).

At the end of the day if LLMs with huge costs or with nonfree models
are somewhat useful to people, then I think the way to go would be to
have nonguix package them and/or make it trivial to install nonguix
(ideally it should also be renamed to look like a real distribution,
host documentation, etc, to be almost like Guix but reuses Guix in a
sustainable way and have a different name to avoid confusion).

We also have a similar rule currently for software that is
not maintained, and here we can add it to guix-past and here too the
cost look small as guix-past is also FSDG compliant.

> Nevertheless, code claimed to be produced in whole or in part by
> genAI **may be incorporated in the limit of at most 15 lines of
> code** to ensure the contributor has a valid copyright claim on the
> code.

I think the problem is something else: how to make sure that 15 lines
are not derived work from work that is incompatible with the GPLv3 or
later.

Beside that:

- With the current rules, nothing prevent contributors for
  including code or data where copyright doesn't apply because that is
  compatible with the GPLv3. Requiring everything to be GPLv3 would
  complicate things a lot.

- As far as I know, 15 lines isn't written in any laws. 1 line or less
  could be copyrightable. More than 15 lines can be not copyrightable
  (for instance in the case of a sort algorithm that is implemented
  in the canonical way).

  Here it would force any person who review patches to became expert in
  laws of many jurisdictions. I don't think we realistically have
  resources for that.

So if we need a GCD specially for LLMs, I think we need it as a
clarification of what already does exists, and potentially add small
additional rules that closes the gaps we have.

Denis.

Attachment: pgpVN3xPPxg9T.pgp
Description: OpenPGP digital signature

Reply via email to