Hi,
On 2026-02-20 17:43, Cayetano Santos via Development of GNU Guix and the
GNU System distribution. wrote:
In my opinion we should accept contributions from LLM’s, provided the
origin of the code is clearly stated somehow (pr title, comment line,
etc.). I don’t think this is something which could avoid, anyway.
I also assume we share the following axiomata.
1. No existing LLM limits its training data
to works belong to the public domain.
2. LLMs may leak their training data, outputing verbatim copies
of their training materials.
3. From around 15 lines of code/text is eligible for copyright [2].
4. We do not take upstreams' copyright claims for granted.
There is a missing point here. To me, free software is all about
empathy: I feel concerned about you, provided you feel concerned about
me. Why this is relevant in this context ? Because I’m thinking about code
reviews.
If your contribution is not important enough for you to bother writing
it by yourself, don’t expect me to read it, even less expend some time
doing a serious review.
This apply as a general rule in all what relates LLM generated text, and
so the importance of clearly stating the origin of the text.
C.
I do agree with most of what I read in the conversation (to different
degrees) and I don't think we can do anything here and we probably
shouldn't either.
If I just contribute things that I stolen from another repository (no
LLM in between), would reviewers check for that?
If the patch is huge, would reviewers accept it?
If the changes don't work, would reviewers let that go in?
All those cases apply if there are LLMs involved or not.
The process is different from the perspective of the person who writes
the code: they may not be aware of the copyright violations, the
hallucinations, etc the LLM will produce. From our side it doesn't
really matter much.
I could've been pushing LLM generated code to Guix for years and you
wouldn't notice. Unless it is garbage code, badly written, badly
formatted or anything else and we already have mechanisms for checking that.
In summary, if the contribution is well done (no obvious copyright
issues, small changes, they work, etc) there's no way we can know if it
used an LLM or not. If we can actually know it would be because the
contribution is not good enough, and in that cases we should just reject
it as we already do.
A different story is if we actually want as a community to share our
*opinion* about the usage of LLMs, or if we want to ask contributors to
say if their contribution was made using LLMs just for adjusting the
review criteria in those contributions (they could still lie, though).
About if your changes are "not important for you" or anything else...
You wouldn't notice either if my changes to Guix are important for me or
not. The point is if they are important and useful for Guix.
I do not review code for the person that sent the code, I review code
for the project. I'm not doing a favor to the person who sent the code,
they are (potentially) doing a favor to me, to the project that I love,
to help me maintain it.
That could be also a very problematic assumption for all the packages we
autogenerate, import and so on. (Are they made for humans to read, Ludovic?)
Of course, I prefer well crafted, well described, commits that work
first run and I can blindly apply. But life is hard.
I dislike the LLM rhetoric and all, but I just don't see any issue from
our side. It's like telling people not to do drugs if they take part in
Guix. That's not my problem as long as their changes are good.[1]
Of course I can have and I do have an opinion about drugs (usage,
production, legality, etc). It's just not relevant.
Cheers,
Ekaitz
[1]: I do have friends that sometimes write code under the influence of
substances. It is very good sometimes (that's what they say) but some
other times they don't understand what they did. Isn't that similar to
what the LLMs do?