On 6/5/26 12:23, Daniel P. Berrangé wrote:
IMHO need unconditional disclosure, because the use of the LLM impacts
the license of the code. QEMU is traditionally expected to be GPLv2+
licensed for all new code, but there's the train of thought that LLM
code is public domain.
If it gets human editting afterwards we can
consider that the human edits are GPLv2+ licensed, but IMHO we still
want to know the origins.

I agree - but then we need to allow certain kind of documentation generation in the policy.

I interpret that to suggest we should not automatically use
SPDX-License-Identifier: GPL-2.0-or-later on LLM generated
code, unless subsequent human editting was non-trivial.

I don't think we will have anytime soon LLM generated files with no human editing. Maybe for tests, but even then I expect _some_ kind of nontrivial editing to be there.

It would definitely be intended for merge. There's a lot of boilerplate
code in the Rust bindings, for example, that is voluminous but *mostly*
lacks creativity---the creative part basically can be described by the
spec/docs and should already clear the low bar required for originality,
even if the code is automatically generated. I included a couple examples
in my reply to Peter.

So we know there are examples which are probably low risk from a license
POV, but which are massively larger than 20 lines of code. This just
makes me more uncomfortable with the 20 line rule as the definition of
the policy - we know that rule is wrong / undesirable from the start and
needs this exception to make it viable.

The 20 lines proposal applies only to bugfixes, which have a higher creative content. The other categories currently under discussions are:

                creativity    size      risk    removal/replacement
mech. changes   LOW/NONE      LARGE     **      one way to do it
boilerplate*    LOW/NONE      LARGE     LOW     mostly one way to do it
docs            ***           LARGE     LOW     EASY
tests           MID           LARGE     MID     EASY
bugfixes        HIGH          SMALL     ??      ??

* under discussion, not in draft
** copyrightability of these changes is debatable altogether, since they would/could/should be doable with tools even in the absence of AI *** depends, but generally the more creative uses would need large cooperation/rework from a human

and it's clear that bugfixes stand out. Which is why I added the (arbitrary, I concede) 20 line rule only for them. I can remove them.

(There's another category that is a can of worms and that I left out, which is the "fancy stackoverflow" category sitting between simple autocomplete and full generation).

I really don't think it can/should be left to a matter of personal
taste.

Something is "mechanical" if it can be assumed that any reasonable
contributor / maintainer would look at it and agree with that idea.

If there is any significant (liklihood of) disagreement on whether
it is mechanical or not, then IMHO we should assume it is NOT
mechanical.

Fair enough---for now, since I don't see us adding more blanket categories, can we say that individual exceptions are possible but need to be discussed on the list?

Paolo


Reply via email to