Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

BALATON Zoltan Fri, 05 Jun 2026 05:41:06 -0700

On Fri, 5 Jun 2026, Daniel P. Berrangé wrote:

On Fri, Jun 05, 2026 at 05:48:37AM -0400, Michael S. Tsirkin wrote:

On Fri, Jun 05, 2026 at 10:39:15AM +0100, Daniel P. Berrangé wrote:

On Fri, Jun 05, 2026 at 05:25:36AM -0400, Michael S. Tsirkin wrote:

On Fri, Jun 05, 2026 at 10:17:16AM +0100, Daniel P. Berrangé wrote:

On Thu, Jun 04, 2026 at 12:37:58PM +0200, Paolo Bonzini wrote:

Il mer 3 giu 2026, 19:54 Daniel P. Berrangé <[email protected]> ha
scritto:

The AI policy should just
make a point that we expect to be communicating with people not
bots pretending to be people.


Yes, it's better to have that stated clearly.

True but we also need a rule. The spirit is better explained elsewhere

(and also, building consensus on spirit vs. a rule are two different
things).


Do we have a better elsewhere in this case ?  It is a point specifically
about intent of the AI policy rule.



The rule in this draft says 20 lines, tests, mechanical changes and docs.
The spirit is what is in the commit message, basically to maximize the
benefit and limit the possible damage?


Putting "the spirit" in the commit message is essentially /dev/null to
anyone reading the policy later.

See my reply to Peter elsewhere in the thread. I agree with your

concerns for both docs and discretion, but I had specific uses in mind
that I'd like to allow.

For docs:
- create tutorials and/or feature documentation based on functional tests


That doesn't sound too appealing to me. Reverse engineering docs or
tutorials from our functional tests is exactly the kind of thing that feels
likely to result in volumous text of marginal value which will have a large
burden on reviewers.


At the same time this can be helpful for maintainers themselves? Let's also
look at this from the point of view of producing better output, not just
from that of being on the receiving end of slop. Especially for docs I have
a hard time imagining people sending out whole new "manuals"... The
bugfixes rule ironically seems the most dangerous to me from the
Dunning-Krueger point of view.

My question is: do we want disclosure for anything is created with the help
of LLMs, even if only small parts survive untouched? I think so, because a
lot more, even if edited, would still be originally from AI. But then it's
important to have rules allowing it and a way to track it.


IMHO need unconditional disclosure, because the use of the LLM impacts
the license of the code. QEMU is traditionally expected to be GPLv2+
licensed for all new code, but there's the train of thought that LLM
code is public domain.
If it gets human editting afterwards we can
consider that the human edits are GPLv2+ licensed, but IMHO we still
want to know the origins.


Wait that's a big ask.

DOC explicitly does not ask if code might be available anywhere else
under any other license. Just that contributor can contribute under GPL.
If it's public domain then the human can license is under GPL.


For new files, in checkpatch we validate that SPDX-License-Identifier
is explicitly set as GPL-2.0-or-later. Contributors are expected to
justify any divergence in the commit message.

I've seen guidance that SPDX-License-Identifier for AI output code
should NOT state a license, under the theory it is public domain.


Not state a license? Recommended by a lawyer? Seen where? Why?


https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues

 "The harder case is when an entire source file, or even
  an entire repository, is generated by AI. Here, adding
  a copyright and license notice may be inappropriate
  unless and until human contributions transform the file
  into a copyrightable work. "

I interpret that to suggest we should not automatically use
SPDX-License-Identifier: GPL-2.0-or-later on LLM generated
code, unless subsequent human editting was non-trivial.

The presumtion that LLM generated code is public domain is dubious. If youtell it to regenerate part of QEMU source after it has seen the GPLsources and it comes up with something equivalent does that make thegenerated version public domain? If so people could just rewrite GPL codeand make it proprietary. This can't be right as the generated code willlikely contain parts copied from the original so still fall under GPL.What if I just tell LLM to rewrite QEMU in C++? Will that make a publicdomain version that I can then make closed source even though it stillcontains large parts of GPL code? I don't think so. The code generated byLLM comes from somewhere but nobody can tell where from so also nobodyknows what licence it is. If you're lucky it comes from examples or othersources with a free licence but could be anything even some open sourcecode not compatible with GPL or proprietary code. The idea of publicdomain probably comes from that there's no human to hold the copyright butwhat about cases of copying copyleft code by LLM that should not make itpublic domain. This is similar to the case when somebody who worked on aproprietary code before then writes some open source code that doessimilar things or vice versa. What is the legal status of those cases? Canthe other party claim copyright for the code? Probably only if the personrecalls whole parts that resemble each other closely which could happen.The risk is probably the same with LLMs and thus the handling of thisshould be similar probably. This seems more complex than assuming anthingfrom an LLM is public domain.

Ultimately QEMU is a copyleft project as a whole and IMHO we should
prioritize retaining that for as large a portion of the codebase is
is practical.


But of course. We can make this explicit too: that
contributing it should be under GPL and/or implies licensing it under GPL.


The subtlety is that generally when changing an existing file, you assume
the edits are under the same licence as the initial code being editted.

If the initial code is LLM generated & thus presumed public domain, it
might be inferred that human edits are public domain too. I don't think
we want to have that interpretation and should be explicit that human
edits to LLM code in code are assumed to be GPL-2.0-or-later licensed
unless explicitly stated to the contrary.

The LICENSE file in QEMU says that sources without a licence areGPL-2.0-or-later so if you add public domain code, it will be that licenceas part of QEMU and won't retain original public domain status.


Regards,
BALATON Zoltan

Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

Reply via email to