Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

Daniel P . Berrangé Fri, 05 Jun 2026 02:40:41 -0700

On Fri, Jun 05, 2026 at 05:25:36AM -0400, Michael S. Tsirkin wrote:
> On Fri, Jun 05, 2026 at 10:17:16AM +0100, Daniel P. Berrangé wrote:
> > On Thu, Jun 04, 2026 at 12:37:58PM +0200, Paolo Bonzini wrote:
> > > Il mer 3 giu 2026, 19:54 Daniel P. Berrangé <[email protected]> ha
> > > scritto:
> > > 
> > > > The AI policy should just
> > > > make a point that we expect to be communicating with people not
> > > > bots pretending to be people.
> > > >
> > > 
> > > Yes, it's better to have that stated clearly.
> > > 
> > > > True but we also need a rule. The spirit is better explained elsewhere
> > > > > (and also, building consensus on spirit vs. a rule are two different
> > > > > things).
> > > >
> > > > Do we have a better elsewhere in this case ?  It is a point specifically
> > > > about intent of the AI policy rule.
> > > 
> > > 
> > > The rule in this draft says 20 lines, tests, mechanical changes and docs.
> > > The spirit is what is in the commit message, basically to maximize the
> > > benefit and limit the possible damage?
> > 
> > Putting "the spirit" in the commit message is essentially /dev/null to
> > anyone reading the policy later.
> > 
> > > > See my reply to Peter elsewhere in the thread. I agree with your
> > > > > concerns for both docs and discretion, but I had specific uses in mind
> > > > > that I'd like to allow.
> > > > >
> > > > > For docs:
> > > > > - create tutorials and/or feature documentation based on functional 
> > > > > tests
> > > >
> > > > That doesn't sound too appealing to me. Reverse engineering docs or
> > > > tutorials from our functional tests is exactly the kind of thing that 
> > > > feels
> > > > likely to result in volumous text of marginal value which will have a 
> > > > large
> > > > burden on reviewers.
> > > >
> > > 
> > > At the same time this can be helpful for maintainers themselves? Let's 
> > > also
> > > look at this from the point of view of producing better output, not just
> > > from that of being on the receiving end of slop. Especially for docs I 
> > > have
> > > a hard time imagining people sending out whole new "manuals"... The
> > > bugfixes rule ironically seems the most dangerous to me from the
> > > Dunning-Krueger point of view.
> > > 
> > > My question is: do we want disclosure for anything is created with the 
> > > help
> > > of LLMs, even if only small parts survive untouched? I think so, because a
> > > lot more, even if edited, would still be originally from AI. But then it's
> > > important to have rules allowing it and a way to track it.
> > 
> > IMHO need unconditional disclosure, because the use of the LLM impacts
> > the license of the code. QEMU is traditionally expected to be GPLv2+
> > licensed for all new code, but there's the train of thought that LLM
> > code is public domain.
> > If it gets human editting afterwards we can
> > consider that the human edits are GPLv2+ licensed, but IMHO we still
> > want to know the origins.
> 
> Wait that's a big ask.
> 
> DOC explicitly does not ask if code might be available anywhere else
> under any other license. Just that contributor can contribute under GPL.
> If it's public domain then the human can license is under GPL.


For new files, in checkpatch we validate that SPDX-License-Identifier
is explicitly set as GPL-2.0-or-later. Contributors are expected to
justify any divergence in the commit message.

I've seen guidance that SPDX-License-Identifier for AI output code
should NOT state a license, under the theory it is public domain.

If it is human editted though, I would expect it to overrule this
guidance and explicitly state GPL-2.0-or-later in the SPDX tag
unless the contributor wants to explicitly put their own edits
under public domain too.

Ultimately QEMU is a copyleft project as a whole and IMHO we should
prioritize retaining that for as large a portion of the codebase is
is practical.

> > > It would definitely be intended for merge. There's a lot of boilerplate
> > > code in the Rust bindings, for example, that is voluminous but *mostly*
> > > lacks creativity---the creative part basically can be described by the
> > > spec/docs and should already clear the low bar required for originality,
> > > even if the code is automatically generated. I included a couple examples
> > > in my reply to Peter.
> > 
> > So we know there are examples which are probably low risk from a license
> > POV, but which are massively larger than 20 lines of code. This just
> > makes me more uncomfortable with the 20 line rule as the definition of
> > the policy - we know that rule is wrong / undesirable from the start and
> > needs this exception to make it viable.
> 
> So 20 lines or mechanical changes? what is considered mechanical will be
> decided by maintainers, contributor should check with them up front.

If we are wanting to allow mechanical changes / boilerplate, then we
should express that in the policy such that the policy can be reasonably
understood without having to ask permission / questions ahead of time. 

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|

Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

Reply via email to