On Fri, May 29, 2026 at 11:46:19AM +0200, Paolo Bonzini wrote: > Until now QEMU's code provenance policy declined any contribution > believed to include or derive from AI-generated content. A blanket ban > was easy to maintain while LLM output was rarely usable on its own, but > as the tools improved an absolute prohibition has become harder to > justify.
In the hope to move this forward, here's an attempt to get all feedback in one place. GPT-5.4. Unreliable of course but we have the contributors here after all ). Guys, anyone feels any of his feedback got missed or misstated? ## Alex Bennée - Pointed out two text nits in the patch: a stray closing `**` and the wording `deterministic tool`, suggesting `deterministic tool or script`. - Wanted an explicit rule that AI must not write commit messages, because writing the summary and rationale is part of demonstrating that the human author understands the change. - Was okay with AI helping only with grammar and spelling correction of human-written commit messages. - Later posted an experimental AI-generated rewrite that: - split the AI policy into a dedicated `ai-usage.rst`, - added explicit human-accountability language, - made `Signed-off-by` human-only, - discouraged prompt dumping in commit messages, - and banned AI-attribution tags other than `AI-used-for:`. - Noted that the model was good at extracting text from the discussion but was not applying real judgment, so he normally prefers to review and reword AI-generated documentation hunks by hand. - In the later security-report side discussion, said the interesting part is whether AI audit tooling finds useful issues compared to fuzzing/static analysis, not which model/vendor produced the report. ## BALATON Zoltan - Said the terminology around `trailers` was confusing because elsewhere in the docs these are referred to as `tags`; that mismatch made the draft harder to follow. - Otherwise thought the revised wording was clearer. - Later objected to treating LLM output as presumptively public domain: - generated output may still contain copied or derived GPL code, - or code originating from incompatible or proprietary sources, - so "no human copyright holder" does not automatically make the result safe or public domain. - Also noted that code added to QEMU without an explicit license is still governed by QEMU's licensing rules, so simplistic public-domain assumptions are risky. ## Peter Maydell - Objected to allowing an individual maintainer to decide that larger AI-generated contributions are acceptable: - if the project concern is legal/provenance blast radius, that should be a project-wide rule, - not something that varies by maintainer preference. - Was especially skeptical of allowing AI-generated documentation and comments: - code at least has compile/test guardrails, - prose docs have only human review, - and documentation/comments are supposed to reflect intended behavior, not auto-generated explanations. - Drew a distinction between: - acceptable assistance such as grammar correction or translation of human-authored text, - versus asking AI to draft documentation from scratch, which he was much less happy with. - In the later Coverity-style discussion, said issue identifiers can occasionally be useful for refinding patches/commits, though not as part of an everyday workflow. ## Stefan Hajnoczi - Flagged one sentence in the proposal as potentially suggesting that submitters no longer need to understand the code they send. - Wanted the policy to stay firmly in the model where the human contributor still understands and is responsible for the submission. - Suggested wording along the lines of "since the risk of bugs not discovered by the submitter increases". - Suggested moving the AI policy into a separate document and referencing it from `AGENTS.md`, so coding agents operating in-tree are explicitly told to refuse tasks that violate the policy. ## Daniel P. Berrangé - Objected to using "projects accepting AI-assisted content have not run into serious legal trouble so far" as reassuring evidence: - copyright risk is a slow-burn issue, - lack of lawsuits so far is not strong evidence that the legal risk is low. - Said `small bug fixes` does not line up well with the separate concern about `core code`: - the real policy goal is closer to low originality / low copyrightability risk / easy reversibility, - not "bug fix" as a category. - Strongly preferred putting the AI rules into a dedicated `ai-usage` document for easier linking and clarity. - Wanted the policy to cover social expectations as well as legal/technical ones: - QEMU collaboration should remain human-to-human, - contributors should not feed review mail into an LLM and paste the answer back, - reviewers using AI should disclose that fact, - and contributor identities should still represent real humans even when pseudonymous. - Said the policy's "spirit" needs to live in the policy text itself, not only in a commit message, because later readers of the policy will never see the commit message rationale. - Wanted stronger and earlier wording that only a human may add `Signed-off-by`, and pointed to Linux kernel wording as a model. - Wanted explicit human authorship of commit messages and cover letters where non-trivial explanation is required. - Liked `AI-used-for:` because it gives reviewers useful information without advertising a vendor. - Wanted `shape your patch` tightened to something like `shape the content of the submitted patch`, so background AI use is excluded more clearly. - Wanted unconditional disclosure of AI use, because provenance matters even when the surviving AI-generated portion is small. - Thought prompts generally should not be included: - if the information matters to reviewers, it belongs in the human-written commit message, - otherwise it just adds clutter. - Thought `QEMU does not use Assisted-by / Co-authored-by / Generated-by` was too weak: - if those tags are unwanted, the policy should explicitly forbid them, - and possibly enforce that in `checkpatch.pl`. - Said the general tag rules should also be documented earlier in the provenance docs, not only in the AI section. - Was especially wary of prose documentation under `docs/`: - AI prose can become convincing-sounding slop, - review of prose is already expensive, - and non-expert contributors may not actually be able to fact-check the text. - Proposed handling documentation more incrementally: - start with a tightly constrained initial docs policy, - then relax it later only if experience shows that broader allowances are worth it. - Was more comfortable with inline API docs/comments than with prose documentation. - Opposed leaving larger exceptions to individual maintainer discretion because that would create inconsistent standards across subsystems and confuse contributors. - Repeatedly pushed back on the `20 lines` rule: - it is not measuring the right thing, - and if there are already larger low-risk categories like mechanical or boilerplate code, the rule is the wrong policy center. - Raised licensing concerns around AI-generated new files and SPDX handling: - some guidance suggests whole-file AI output should not automatically get a license header unless human edits make it copyrightable, - but QEMU should still make clear that human edits to AI-generated code are assumed GPL-2.0-or-later unless explicitly stated otherwise. - Also clarified that any "public domain" argument for LLM output only makes sense when the output is not credibly a derived work: - cloning QEMU into another language would likely still be a derived work, - and some non-trivial feature code that follows established QEMU design patterns could also plausibly still be GPL-derived. - Rejected the idea that "mechanical" should be left to personal taste: - if reasonable people might disagree whether a change is mechanical, - the policy should assume it is not mechanical. - Said that if mechanical changes or boilerplate are allowed, the policy should define them clearly enough that contributors can understand what is allowed without having to ask permission in advance. - In the later security-report side discussion, argued against giving AI tools `Reported-by` credit: - the accountable party is the human reporter, - and the project should not provide free advertising to tool vendors. - When Alex posted an AI-generated rewrite of the policy, said it had incorporated comments too indiscriminately, become more verbose, lost structure, and was drifting toward slop. ## Kevin Wolf - Said `20 lines or less` is a poor proxy for what the project is actually trying to allow: - the real target is trivial or low-complexity code, - and it is easy to write 20 lines that are not trivial at all. - Suggested line count could at most be an example, not the entire rule. - Noted that "just say no to slop" is easy to say but not especially comfortable in practice for maintainers. - Was skeptical that LLM workflows are meaningfully reproducible anyway. - In the later security-report discussion, said AI-found issues feel analogous to Coverity: - they generally do not deserve a `Reported-by` trailer, - but mention in commit-message prose can make sense if useful. ## Michael S. Tsirkin - Suggested explicitly allowing AI to correct grammar and spelling in text the contributor already wrote, as long as AI is not writing the text from scratch. - Argued there are cases where a maintainer may reasonably judge generated code to be so QEMU-specific or so tightly coupled to the current tree that accidental copying risk is negligible. - Repeatedly emphasized that AI is especially useful for helping non-native English speakers. - Questioned how effective the `20 lines` rule would be if many small AI-assisted contributions simply accumulate over time. - Later suggested that if mechanical changes are allowed, the policy should say `clearly mechanical` or `obviously mechanical` and include examples. - Also suggested that for borderline `mechanical` changes, contributors should check with maintainers up front because what counts as mechanical is still a maintainer judgment. - In the licensing discussion, argued that if something truly is public domain, a human can still submit it under GPL terms, and that the policy could explicitly say contributing it to QEMU implies appropriate GPL licensing. - Floated declining whole new AI-generated files for now unless they are just reorganizations of existing code that already inherit SPDX/licensing context. - Also suggested maintainers can warn and eventually ignore repeat slop submitters. - In response to the later security-report question, said AI-assisted security scanning was already allowed under the current policy. ## Christian Borntraeger - Asked how the policy should treat a human-submitted patch that is based on an AI-generated security report. - Asked whether, if such reports are allowed, the project should add something like `Reported-by: Claude` or `Reported-by: ChatGPT`.
