Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

Michael S. Tsirkin Wed, 03 Jun 2026 08:07:52 -0700

On Wed, Jun 03, 2026 at 03:59:35PM +0100, Daniel P. Berrangé wrote:
> On Fri, May 29, 2026 at 11:46:19AM +0200, Paolo Bonzini wrote:
> > The concern that motivated the policy is unchanged, and it is worth stating
> > precisely: the DCO is about whether the submitter has the legal right to
> > contribute the code, not about "creative expression".  While the status of
> > LLM output seems to be converging towards non-copyrightability, questions
> > around unintentional reproduction of copyrighted code are still open.
> > What has shifted is the balance of risk:
> > 
> > - projects accepting AI-assisted content have not run into serious
> >   legal trouble so far, which suggests the probability of the risk
> >   materializing is not high;
> 
> "so far" is doing alot of heavy lifting here & generally I think this
> rather over-estimates the speed at which legal issues might arises.
> Copyright infringement is a "slow burn" where the risk accumulates
> over time and issues, if discovered, may not be litigated immediately.
> 
> That is NOT to say the risk is high. The risk may well still be
> low. I'm just saying that there's not been sufficient time to use
> "lack of lawsuits" as a rationalization IMHO.
> 
> > - other organizations, such as Red Hat[1], have assessed the risk as
> >   acceptable -- though a community of individual developers does not
> >   have the legal backing of a company, and even an unfounded dispute
> >   would be a long-lasting distraction from work on QEMU.
> >
> > Nevertheless, even Red Hat mentions that "the possibility of occasional
> > replication cannot be ignored".  In QEMU's view, attentiveness and
> > oversight are not a practical way to address this; yet as a copyleft
> > project, copyright and code provenance are of utmost importance to us.
> 
> 
> > Therefore, it remains prudent to only permit AI assistance where the
> > ramifications of copyright violations are at least easy to revert and
> > unlikely to spread: tests, documentation, mechanical changes, and small
> > bug fixes.  Core code that other things depend on, and that cannot
> > simply be thrown away once a problem is noticed long after the fact,
> > stays off-limits without prior agreement from a maintainer.
> 
> The interaction of "small bug fixes" and "core code" doesn't
> fit well IMHO. A "bug fix" describes an action, but the code
> that is changed is usually a "feature" and will often be a
> "core" part of something in QEMU.
> 
> IIUC, by "small bug fixes", what you're actually trying to
> express is an acceptance of code that is either
> 
>   * unlikely to meet the threshold for copyrightability
>   * small enough that the consequences of throwing it
>     away is negligible.
>   * possibly other aspects ?



tightly coupled to specific state of qemu code and so original.

> 
> 
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index 65b8f232a08..857588c43ba 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -1,7 +1,7 @@
> >  .. _code-provenance:
> >  
> > -Code provenance
> > -===============
> > +Code provenance and AI usage
> > +============================
> 
> In retrospect, I wonder if we shouldn't have have "ai-usage.rst" as
> a separate doc from the start.  While we can hyperlink to sub-titles
> via anchors, it would be simpler if we could just point to a doc and
> not require scrolling past pages of non-AI text.
> 
> > @@ -288,62 +288,108 @@ content generators below.
> >  Use of AI-generated content
> >  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> > +Risks to open source projects include maintainer burnout from an
> > +increased number of contributions, as well as the risk to the project
> > +from unintentional inclusion of copyrighted material in the LLM's output.
> > +In order to mitigate these risks, the QEMU project currently allows
> > +using AI/LLM tools to produce patches in a limited set of scenarios:
> 
> If we're opening the door to AI assisted contribution, then IMHO we
> need to write about both the social and technical expectations.
> Admittedly that will expand the scope of your proposal here, but
> IMHO that's somewhat unavoidable. A significant part of the downsides
> of AI-assisted contributions comes from bad social practices, rather
> than merely bad technical practices. 
> 
> As a general theme, I would like us to emphasize at the start that the
> act of collaboration & contribution in QEMU is about the interaction,
> trust and relationships between humans, not bots.
> 
> 
> If someone wants to use tools (LLM based or not) that's a choice,
> but the accountability for actions needs to fall on a real human
> and there needs to be transparency whenever automation is used.
> 
> This starts from the commit message.  A good commit message (and even
> more so a good cover letter) describes the intent / thinking behind
> the changes.  An LLM doesn't think or have intent in its actions,
> ergo a human should be driving the authorship of commit mesages /
> cover letters, where a non-trivial explanation is needed.
> 
> As reviewers, if we make use of LLM backed tools to respond, then
> we need to be transparent about any feedback that came from a bot
> rather than from a human.
> 
> As contributors, if a reviewer gives feedback, the contributors
> response should be their own rather than just feeding the email
> review into a LLM and cut+pasting the LLMs answer back to the
> list.
> 
> The identity use to contribute to QEMU should reflect the human's
> identify. As previously clarified, this doesn't need to be a real
> name, but we don't want LLM agents being given a psuedonym to
> pretend to be a human. 
> 
> > +**Mechanical changes**
> > +  If you can use a deterministic tool, it is preferred that you use it
> > +  and not replace it with AI. If you don't know how to do the change
> > +  deterministically, you can ask the AI for help.
> 
> > +**Small bug fixes**
> > +  These should be limited to 20 lines of code or less, not including
> > +  tests.  You are still expected to :ref:`understand and explain your 
> > changes
> > +  <write_a_meaningful_commit_message>` and the rationale behind them.
> 
> I think the "20 lines or less" is not going a good job at expressing
> the intent behind this point. I'd like us to emphasize between the
> "why" of this point, as that helps contributors & reviewers make a
> decision of whether a change is "within the spirit" or the rule of
> not.
> 
> >  
> > +**Documentation and code comments**
> > +  While AI can help draft text, it still requires significant human
> > +  oversight.  Pay attention to the organization and flow of the generated
> > +  text, and strictly fact-check all technical details as LLMs are prone
> > +  to being confidently wrong.
> 
> Docs is an area I'm more wary of from the social expectation side rather
> than the technical or legal side.  I don't feeel like "pay attention to
> the organization and flow" really mitigates to the tendancy to production
> of vast reams of convincing sounding slop. There's has always been a
> problem with docs of well intentioned contributors trying to write about
> stuff they don't really understand well enough. IOW they don't necccessarily
> have the knowledge to fact check details either. As a maintainer, I've found
> that reviewing docs and asking for rewrites can be even more of a burden than
> code. IOW, encouraging use of AI for docs, in non-expert hands, has a strong
> potential for expanding the burden on maintainers.
> 
> I'd be more comfortable with AI tools for inline API docs, rather than
> AI tools for prose under docs/.
> 
> Not sure how to better word this point though ?
> 
> > +**Tests**
> > +  Note that you must still confirm that each test actually exercises
> > +  the intended behavior including, for regression tests, that it
> > +  fails without the code under test and passes for the right reason.
> >
> 
> > +If you wish to send large amounts of AI-generated changes, or any other
> > +contribution not in the above categories, please get in touch with the
> > +maintainer beforehand.  These can be treated as experiments, at the
> > +discretion of the maintainer and the community, with no obligation
> > +to accept them.
> 
> IMHO it should not be at the discretion of individual maintainers to
> accept large-scale AI authored changes outside these guidelines. To
> quote the commit message rationale
> 
>    "Therefore, it remains prudent to only permit AI assistance where
>     the ramifications of copyright violations are at least easy to
>     revert and unlikely to spread"
> 
> that does not suggest we should leave it to the discretion of maintainers
> to override the guidelines. 
> 
> > +**Use of AI does not remove the need for authors to comply with all
> > +other requirements for contribution.**  In particular, the
> > +``Signed-off-by`` label in a patch submission is a statement that
> > +the author takes responsibility for the entire contents of the patch,
> > +certifying that their patch submission is made in accordance with the
> > +rules of the `Developer's Certificate of Origin (DCO) <dco>`.
> 
> 
> This needs to be stronger language IMHO. The kernel has a more
> explicit statement explicitly forbidding agents from adding
> Signed-off-by on behalf of the human:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-assistants.rst?id=4bf85afb9f3ecd7c3b5d15a85b0902f8e725cd06#n27
> 
>   "Signed-off-by and Developer Certificate of Origin
>    =================================================
> 
>   AI agents MUST NOT add Signed-off-by tags. Only humans can legally
>   certify the Developer Certificate of Origin (DCO). The human submitter
>   is responsible for:
> 
>   * Reviewing all AI-generated code
>   * Ensuring compliance with licensing requirements
>   * Adding their own Signed-off-by tag to certify the DCO
>   * Taking full responsibility for the contribution"
> 
> 
> I think we should be similarly explicit that a human must take
> the action of adding S-o-b - it is not a rubber stamp to be
> automated by the AI.
> 
> This should be emphasized in the earlier part of the doc before
> the AI usage section where we described S-o-b usage.
> 
> 
> > +Commit messages for AI-assisted changes
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >  
> > +When AI/LLM tools produce or substantively shape your patch, add an
> 
> "shape your patch" ->  "shape the content of the submitted patch"
> 
> as this better excludes the "background" usage mentioned below.
> 
> > +``AI-used-for:`` line before ``Signed-off-by``, as a reminder of your
> > +DCO obligations and a guide to reviewers.  The text is one or more of
> > +``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> > +explanation in parentheses:
> >  
> > +.. code-block:: none
> > +
> > +     AI-used-for: tests, docs
> > +     AI-used-for: code
> > +     AI-used-for: code (refactoring)
> > +     AI-used-for: code (prototype)
> > +     AI-used-for: research
> > +
> > +``AI-used-for`` should not be included for "background" usage such as
> > +autocomplete or obtaining a pre-review of the patch.
> 
> This is an interestng idea that I like much more than Assisted-by,
> because it gives more directly useful info to the reviewer, without
> turning into free advertizing for commercial vendors.
> 
> > +There is no requirement to include your prompts or summarize the
> > +conversation in the commit message or cover letter, but you may do so
> > +if you think it helps a reviewer judge the result.  For example:
> 
> IMHO we should actively discourage the inclusion of prompts
> entirely as it is the wrong information to provide. 
> 
> > +
> > +**Helpful prompts**
> > +  These describe concrete constraints or instructions, making it easy for a
> > +  reviewer to see how the tool's output was guided:
> > +
> > +  * "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a
> > +    function already has a local variable or parameter of type ``struct
> > +    bb``, use it instead of accessing ``aa.bb``"
> > +
> > +  * "add an implementation of the trait for ``Mutex<T: MyTrait>``; it
> > +    takes the lock around the calls and forwards to ``T``"
> 
> These examples prompts are just expressing an aspect that should
> already have been described in prose in the commit message. We
> don't need to classify them as "ai prompts" in a a commit message,
> we just need the author to write a useful commit message.
> 
> > +**Unhelpful prompts**
> > +  These are too generic to provide meaningful context.  You can of course
> > +  use them in the context of a complex interaction with the LLM, but they
> > +  should not be included in the commit message:
> > +
> > +  * "write user-facing documentation for the new tool"
> > +
> > +  * "write testcases for the new functions"
> 
> Again this is just an illustration of an unhelpful commit message.
> Those would be eqally useless in an entirely human authored patch.
> Just emphasize the writing of useful commit messages.
> 
> 
> > +QEMU does *not* use ``Assisted-by``, ``Co-authored-by`` or ``Generated-by``
> > +trailers to indicate AI usage.  In particular, it is not necessary to
> > +specify the exact AI model or tool used to create the commit.
> 
> "does not use" doesn't imply "forbidden".
> 
> IIUC, tools are liable to add these tags without the contributor
> even asking for them. If we don't want to be providing free
> advertizing IMHO we should explicitly forbid use of these tags
> and validate this in checkpatch.pl
> 
> Also any rules in this respect should be documented earlier in
> this file where we outline what tags we use in commit messages,
> either instead of, or in addition to, mentioning them under the
> AI usage guidelines.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
> |: https://libvirt.org          ~~          https://entangle-photo.org :|
> |: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|

Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

Reply via email to