Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

Michael S. Tsirkin Tue, 23 Jun 2026 12:27:40 -0700

On Fri, May 29, 2026 at 11:46:19AM +0200, Paolo Bonzini wrote:
> Until now QEMU's code provenance policy declined any contribution
> believed to include or derive from AI-generated content.  A blanket ban
> was easy to maintain while LLM output was rarely usable on its own, but
> as the tools improved an absolute prohibition has become harder to
> justify.


In the hope to move this forward, here's an attempt to get all feedback
in one place. GPT-5.4. Unreliable of course but we have the contributors
here after all ). Guys, anyone feels any of his feedback got missed or
misstated?


## Alex Bennée

- Pointed out two text nits in the patch: a stray closing `**` and the wording 
`deterministic tool`, suggesting `deterministic tool or script`.
- Wanted an explicit rule that AI must not write commit messages, because 
writing the summary and rationale is part of demonstrating that the human 
author understands the change.
- Was okay with AI helping only with grammar and spelling correction of 
human-written commit messages.
- Later posted an experimental AI-generated rewrite that:
  - split the AI policy into a dedicated `ai-usage.rst`,
  - added explicit human-accountability language,
  - made `Signed-off-by` human-only,
  - discouraged prompt dumping in commit messages,
  - and banned AI-attribution tags other than `AI-used-for:`.
- Noted that the model was good at extracting text from the discussion but was 
not applying real judgment, so he normally prefers to review and reword 
AI-generated documentation hunks by hand.
- In the later security-report side discussion, said the interesting part is 
whether AI audit tooling finds useful issues compared to fuzzing/static 
analysis, not which model/vendor produced the report.

## BALATON Zoltan

- Said the terminology around `trailers` was confusing because elsewhere in the 
docs these are referred to as `tags`; that mismatch made the draft harder to 
follow.
- Otherwise thought the revised wording was clearer.
- Later objected to treating LLM output as presumptively public domain:
  - generated output may still contain copied or derived GPL code,
  - or code originating from incompatible or proprietary sources,
  - so "no human copyright holder" does not automatically make the result safe 
or public domain.
- Also noted that code added to QEMU without an explicit license is still 
governed by QEMU's licensing rules, so simplistic public-domain assumptions are 
risky.

## Peter Maydell

- Objected to allowing an individual maintainer to decide that larger 
AI-generated contributions are acceptable:
  - if the project concern is legal/provenance blast radius, that should be a 
project-wide rule,
  - not something that varies by maintainer preference.
- Was especially skeptical of allowing AI-generated documentation and comments:
  - code at least has compile/test guardrails,
  - prose docs have only human review,
  - and documentation/comments are supposed to reflect intended behavior, not 
auto-generated explanations.
- Drew a distinction between:
  - acceptable assistance such as grammar correction or translation of 
human-authored text,
  - versus asking AI to draft documentation from scratch, which he was much 
less happy with.
- In the later Coverity-style discussion, said issue identifiers can 
occasionally be useful for refinding patches/commits, though not as part of an 
everyday workflow.

## Stefan Hajnoczi

- Flagged one sentence in the proposal as potentially suggesting that 
submitters no longer need to understand the code they send.
- Wanted the policy to stay firmly in the model where the human contributor 
still understands and is responsible for the submission.
- Suggested wording along the lines of "since the risk of bugs not discovered 
by the submitter increases".
- Suggested moving the AI policy into a separate document and referencing it 
from `AGENTS.md`, so coding agents operating in-tree are explicitly told to 
refuse tasks that violate the policy.

## Daniel P. Berrangé

- Objected to using "projects accepting AI-assisted content have not run into 
serious legal trouble so far" as reassuring evidence:
  - copyright risk is a slow-burn issue,
  - lack of lawsuits so far is not strong evidence that the legal risk is low.
- Said `small bug fixes` does not line up well with the separate concern about 
`core code`:
  - the real policy goal is closer to low originality / low copyrightability 
risk / easy reversibility,
  - not "bug fix" as a category.
- Strongly preferred putting the AI rules into a dedicated `ai-usage` document 
for easier linking and clarity.
- Wanted the policy to cover social expectations as well as legal/technical 
ones:
  - QEMU collaboration should remain human-to-human,
  - contributors should not feed review mail into an LLM and paste the answer 
back,
  - reviewers using AI should disclose that fact,
  - and contributor identities should still represent real humans even when 
pseudonymous.
- Said the policy's "spirit" needs to live in the policy text itself, not only 
in a commit message, because later readers of the policy will never see the 
commit message rationale.
- Wanted stronger and earlier wording that only a human may add 
`Signed-off-by`, and pointed to Linux kernel wording as a model.
- Wanted explicit human authorship of commit messages and cover letters where 
non-trivial explanation is required.
- Liked `AI-used-for:` because it gives reviewers useful information without 
advertising a vendor.
- Wanted `shape your patch` tightened to something like `shape the content of 
the submitted patch`, so background AI use is excluded more clearly.
- Wanted unconditional disclosure of AI use, because provenance matters even 
when the surviving AI-generated portion is small.
- Thought prompts generally should not be included:
  - if the information matters to reviewers, it belongs in the human-written 
commit message,
  - otherwise it just adds clutter.
- Thought `QEMU does not use Assisted-by / Co-authored-by / Generated-by` was 
too weak:
  - if those tags are unwanted, the policy should explicitly forbid them,
  - and possibly enforce that in `checkpatch.pl`.
- Said the general tag rules should also be documented earlier in the 
provenance docs, not only in the AI section.
- Was especially wary of prose documentation under `docs/`:
  - AI prose can become convincing-sounding slop,
  - review of prose is already expensive,
  - and non-expert contributors may not actually be able to fact-check the text.
- Proposed handling documentation more incrementally:
  - start with a tightly constrained initial docs policy,
  - then relax it later only if experience shows that broader allowances are 
worth it.
- Was more comfortable with inline API docs/comments than with prose 
documentation.
- Opposed leaving larger exceptions to individual maintainer discretion because 
that would create inconsistent standards across subsystems and confuse 
contributors.
- Repeatedly pushed back on the `20 lines` rule:
  - it is not measuring the right thing,
  - and if there are already larger low-risk categories like mechanical or 
boilerplate code, the rule is the wrong policy center.
- Raised licensing concerns around AI-generated new files and SPDX handling:
  - some guidance suggests whole-file AI output should not automatically get a 
license header unless human edits make it copyrightable,
  - but QEMU should still make clear that human edits to AI-generated code are 
assumed GPL-2.0-or-later unless explicitly stated otherwise.
- Also clarified that any "public domain" argument for LLM output only makes 
sense when the output is not credibly a derived work:
  - cloning QEMU into another language would likely still be a derived work,
  - and some non-trivial feature code that follows established QEMU design 
patterns could also plausibly still be GPL-derived.
- Rejected the idea that "mechanical" should be left to personal taste:
  - if reasonable people might disagree whether a change is mechanical,
  - the policy should assume it is not mechanical.
- Said that if mechanical changes or boilerplate are allowed, the policy should 
define them clearly enough that contributors can understand what is allowed 
without having to ask permission in advance.
- In the later security-report side discussion, argued against giving AI tools 
`Reported-by` credit:
  - the accountable party is the human reporter,
  - and the project should not provide free advertising to tool vendors.
- When Alex posted an AI-generated rewrite of the policy, said it had 
incorporated comments too indiscriminately, become more verbose, lost 
structure, and was drifting toward slop.

## Kevin Wolf

- Said `20 lines or less` is a poor proxy for what the project is actually 
trying to allow:
  - the real target is trivial or low-complexity code,
  - and it is easy to write 20 lines that are not trivial at all.
- Suggested line count could at most be an example, not the entire rule.
- Noted that "just say no to slop" is easy to say but not especially 
comfortable in practice for maintainers.
- Was skeptical that LLM workflows are meaningfully reproducible anyway.
- In the later security-report discussion, said AI-found issues feel analogous 
to Coverity:
  - they generally do not deserve a `Reported-by` trailer,
  - but mention in commit-message prose can make sense if useful.

## Michael S. Tsirkin

- Suggested explicitly allowing AI to correct grammar and spelling in text the 
contributor already wrote, as long as AI is not writing the text from scratch.
- Argued there are cases where a maintainer may reasonably judge generated code 
to be so QEMU-specific or so tightly coupled to the current tree that 
accidental copying risk is negligible.
- Repeatedly emphasized that AI is especially useful for helping non-native 
English speakers.
- Questioned how effective the `20 lines` rule would be if many small 
AI-assisted contributions simply accumulate over time.
- Later suggested that if mechanical changes are allowed, the policy should say 
`clearly mechanical` or `obviously mechanical` and include examples.
- Also suggested that for borderline `mechanical` changes, contributors should 
check with maintainers up front because what counts as mechanical is still a 
maintainer judgment.
- In the licensing discussion, argued that if something truly is public domain, 
a human can still submit it under GPL terms, and that the policy could 
explicitly say contributing it to QEMU implies appropriate GPL licensing.
- Floated declining whole new AI-generated files for now unless they are just 
reorganizations of existing code that already inherit SPDX/licensing context.
- Also suggested maintainers can warn and eventually ignore repeat slop 
submitters.
- In response to the later security-report question, said AI-assisted security 
scanning was already allowed under the current policy.

## Christian Borntraeger

- Asked how the policy should treat a human-submitted patch that is based on an 
AI-generated security report.
- Asked whether, if such reports are allowed, the project should add something 
like `Reported-by: Claude` or `Reported-by: ChatGPT`.

Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

Reply via email to