Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

Daniel P . Berrangé Fri, 05 Jun 2026 03:35:12 -0700

On Fri, Jun 05, 2026 at 06:28:31AM -0400, Michael S. Tsirkin wrote:
> On Fri, Jun 05, 2026 at 11:23:54AM +0100, Daniel P. Berrangé wrote:
> > On Fri, Jun 05, 2026 at 05:48:37AM -0400, Michael S. Tsirkin wrote:
> > > On Fri, Jun 05, 2026 at 10:39:15AM +0100, Daniel P. Berrangé wrote:
> > > > On Fri, Jun 05, 2026 at 05:25:36AM -0400, Michael S. Tsirkin wrote:
> > > > > On Fri, Jun 05, 2026 at 10:17:16AM +0100, Daniel P. Berrangé wrote:
> > > > > > On Thu, Jun 04, 2026 at 12:37:58PM +0200, Paolo Bonzini wrote:
> > > > > > > Il mer 3 giu 2026, 19:54 Daniel P. Berrangé <[email protected]> 
> > > > > > > ha
> > > > > > > scritto:
> > > > > > > 
> > > > > > > > The AI policy should just
> > > > > > > > make a point that we expect to be communicating with people not
> > > > > > > > bots pretending to be people.
> > > > > > > >
> > > > > > > 
> > > > > > > Yes, it's better to have that stated clearly.
> > > > > > > 
> > > > > > > > True but we also need a rule. The spirit is better explained 
> > > > > > > > elsewhere
> > > > > > > > > (and also, building consensus on spirit vs. a rule are two 
> > > > > > > > > different
> > > > > > > > > things).
> > > > > > > >
> > > > > > > > Do we have a better elsewhere in this case ?  It is a point 
> > > > > > > > specifically
> > > > > > > > about intent of the AI policy rule.
> > > > > > > 
> > > > > > > 
> > > > > > > The rule in this draft says 20 lines, tests, mechanical changes 
> > > > > > > and docs.
> > > > > > > The spirit is what is in the commit message, basically to 
> > > > > > > maximize the
> > > > > > > benefit and limit the possible damage?
> > > > > > 
> > > > > > Putting "the spirit" in the commit message is essentially /dev/null 
> > > > > > to
> > > > > > anyone reading the policy later.
> > > > > > 
> > > > > > > > See my reply to Peter elsewhere in the thread. I agree with your
> > > > > > > > > concerns for both docs and discretion, but I had specific 
> > > > > > > > > uses in mind
> > > > > > > > > that I'd like to allow.
> > > > > > > > >
> > > > > > > > > For docs:
> > > > > > > > > - create tutorials and/or feature documentation based on 
> > > > > > > > > functional tests
> > > > > > > >
> > > > > > > > That doesn't sound too appealing to me. Reverse engineering 
> > > > > > > > docs or
> > > > > > > > tutorials from our functional tests is exactly the kind of 
> > > > > > > > thing that feels
> > > > > > > > likely to result in volumous text of marginal value which will 
> > > > > > > > have a large
> > > > > > > > burden on reviewers.
> > > > > > > >
> > > > > > > 
> > > > > > > At the same time this can be helpful for maintainers themselves? 
> > > > > > > Let's also
> > > > > > > look at this from the point of view of producing better output, 
> > > > > > > not just
> > > > > > > from that of being on the receiving end of slop. Especially for 
> > > > > > > docs I have
> > > > > > > a hard time imagining people sending out whole new "manuals"... 
> > > > > > > The
> > > > > > > bugfixes rule ironically seems the most dangerous to me from the
> > > > > > > Dunning-Krueger point of view.
> > > > > > > 
> > > > > > > My question is: do we want disclosure for anything is created 
> > > > > > > with the help
> > > > > > > of LLMs, even if only small parts survive untouched? I think so, 
> > > > > > > because a
> > > > > > > lot more, even if edited, would still be originally from AI. But 
> > > > > > > then it's
> > > > > > > important to have rules allowing it and a way to track it.
> > > > > > 
> > > > > > IMHO need unconditional disclosure, because the use of the LLM 
> > > > > > impacts
> > > > > > the license of the code. QEMU is traditionally expected to be GPLv2+
> > > > > > licensed for all new code, but there's the train of thought that LLM
> > > > > > code is public domain.
> > > > > > If it gets human editting afterwards we can
> > > > > > consider that the human edits are GPLv2+ licensed, but IMHO we still
> > > > > > want to know the origins.
> > > > > 
> > > > > Wait that's a big ask.
> > > > > 
> > > > > DOC explicitly does not ask if code might be available anywhere else
> > > > > under any other license. Just that contributor can contribute under 
> > > > > GPL.
> > > > > If it's public domain then the human can license is under GPL.
> > > > 
> > > > For new files, in checkpatch we validate that SPDX-License-Identifier
> > > > is explicitly set as GPL-2.0-or-later. Contributors are expected to
> > > > justify any divergence in the commit message.
> > > > 
> > > > I've seen guidance that SPDX-License-Identifier for AI output code
> > > > should NOT state a license, under the theory it is public domain.
> > > 
> > > Not state a license? Recommended by a lawyer? Seen where? Why?
> > 
> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> > 
> >   "The harder case is when an entire source file, or even
> >    an entire repository, is generated by AI. Here, adding
> >    a copyright and license notice may be inappropriate
> >    unless and until human contributions transform the file
> >    into a copyrightable work. "
> > 
> > I interpret that to suggest we should not automatically use
> > SPDX-License-Identifier: GPL-2.0-or-later on LLM generated
> > code, unless subsequent human editting was non-trivial.
> > > > Ultimately QEMU is a copyleft project as a whole and IMHO we should
> > > > prioritize retaining that for as large a portion of the codebase is
> > > > is practical.
> > > 
> > > But of course. We can make this explicit too: that
> > > contributing it should be under GPL and/or implies licensing it under GPL.
> > 
> > The subtlety is that generally when changing an existing file, you assume
> > the edits are under the same licence as the initial code being editted.
> > 
> > If the initial code is LLM generated & thus presumed public domain, it
> > might be inferred that human edits are public domain too. I don't think
> > we want to have that interpretation and should be explicit that human
> > edits to LLM code in code are assumed to be GPL-2.0-or-later licensed
> > unless explicitly stated to the contrary.
> 
> Oh intresting! Thanks! So maybe we should decline whole new files
> for now unless it's a reorg of existing code so it inherits SPDX.


I think the "new file" case is probably relevant for Paolo's
example though of using an LLM for some Rust boilerplate, and
then editting afterwards.


With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|

Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

Reply via email to