On Fri, May 29, 2026 at 04:34:45PM +0100, Peter Maydell wrote:
> On Fri, 29 May 2026 at 10:46, Paolo Bonzini <[email protected]> wrote:
> >
> > Until now QEMU's code provenance policy declined any contribution
> > believed to include or derive from AI-generated content.  A blanket ban
> > was easy to maintain while LLM output was rarely usable on its own, but
> > as the tools improved an absolute prohibition has become harder to
> > justify.
> >
> > The concern that motivated the policy is unchanged, and it is worth stating
> > precisely: the DCO is about whether the submitter has the legal right to
> > contribute the code, not about "creative expression".  While the status of
> > LLM output seems to be converging towards non-copyrightability, questions
> > around unintentional reproduction of copyrighted code are still open.
> > What has shifted is the balance of risk:
> >
> > - projects accepting AI-assisted content have not run into serious
> >   legal trouble so far, which suggests the probability of the risk
> >   materializing is not high;
> >
> > - other organizations, such as Red Hat[1], have assessed the risk as
> >   acceptable -- though a community of individual developers does not
> >   have the legal backing of a company, and even an unfounded dispute
> >   would be a long-lasting distraction from work on QEMU.
> >
> > Nevertheless, even Red Hat mentions that "the possibility of occasional
> > replication cannot be ignored".  In QEMU's view, attentiveness and
> > oversight are not a practical way to address this; yet as a copyleft
> > project, copyright and code provenance are of utmost importance to us.
> > Therefore, it remains prudent to only permit AI assistance where the
> > ramifications of copyright violations are at least easy to revert and
> > unlikely to spread: tests, documentation, mechanical changes, and small
> > bug fixes.  Core code that other things depend on, and that cannot
> > simply be thrown away once a problem is noticed long after the fact,
> > stays off-limits without prior agreement from a maintainer.
> 
> This all makes sense to me, except for the part where we allow
> a maintainer to say "actually it's OK". Where our justification
> for not wanting AI contributions rests on "it's too much burden
> on maintainers to have to deal with and review it", allowing an
> individual maintainer to say "I'm OK with that burden in this case
> or for this particular contribution" logically follows as a
> possible relaxation. But if as a project we want to limit the
> blast-radius if we find we have to rip out a hypothetical tainted
> contribution, shouldn't that mean that we hold that as a project-wide
> line, rather than leaving it up to the opinion of the individual
> maintainer ?

It's not clear it's practical anyway. So we limit contributions to 20+
lines, so what did we achieve? They accumulate over time.


> > Related to this, and already visible in the incredible uptick in
> > security reports, is the question of maintainer burnout and the shift in
> > effort from the author to the reviewer of the code.  AI lowers the cost of
> > producing a patch but does nothing to lower the cost of understanding and
> > reviewing one; if anything it raises it, since a reviewer can no longer
> > assume that the submitter has reasoned through every line.  The limits
> > above work just as much to keep the volume of review work sustainable.
> >
> > Revise the policy according to the above considerations, and introduce the
> > "AI-used-for:" trailer as a record of where AI was used.  The standard is
> > slightly different from the more usual "Assisted-by"; the intention is for
> > the metadata to provide more information for reviewers to judge the result.
> >
> > In any case, use of AI does not relax any other contribution requirement:
> > authors still comply with the DCO and take responsibility for the whole
> > patch via Signed-off-by.
> >
> > [Commit message largely based on
> >  https://lore.kernel.org/qemu-devel/[email protected]/, by
> >  Kevin Wolf. - Paolo]
> 
> > +**Documentation and code comments**
> > +  While AI can help draft text, it still requires significant human
> > +  oversight.  Pay attention to the organization and flow of the generated
> > +  text, and strictly fact-check all technical details as LLMs are prone
> > +  to being confidently wrong.
> 
> I think the application to documentation and comments is the part
> I'm least enthusiastic about here. For changes to code, we have at
> least some guardrails on the AI output, in the fact that it has to
> compile and to pass tests. For changes to documentation, the
> only guardrails are human eyeballs.
> 
> Also both comments and documentation ideally are a record of
> what we intended the behaviour to be. If an LLM is effectively
> autogenerating something documentation-shaped from the code we
> lose that.
> 
> -- PMM


Reply via email to