On Wed, Jun 03, 2026 at 03:59:35PM +0100, Daniel P. Berrangé wrote: > On Fri, May 29, 2026 at 11:46:19AM +0200, Paolo Bonzini wrote: > > The concern that motivated the policy is unchanged, and it is worth stating > > precisely: the DCO is about whether the submitter has the legal right to > > contribute the code, not about "creative expression". While the status of > > LLM output seems to be converging towards non-copyrightability, questions > > around unintentional reproduction of copyrighted code are still open. > > What has shifted is the balance of risk: > > > > - projects accepting AI-assisted content have not run into serious > > legal trouble so far, which suggests the probability of the risk > > materializing is not high; > > "so far" is doing alot of heavy lifting here & generally I think this > rather over-estimates the speed at which legal issues might arises. > Copyright infringement is a "slow burn" where the risk accumulates > over time and issues, if discovered, may not be litigated immediately. > > That is NOT to say the risk is high. The risk may well still be > low. I'm just saying that there's not been sufficient time to use > "lack of lawsuits" as a rationalization IMHO. > > > - other organizations, such as Red Hat[1], have assessed the risk as > > acceptable -- though a community of individual developers does not > > have the legal backing of a company, and even an unfounded dispute > > would be a long-lasting distraction from work on QEMU. > > > > Nevertheless, even Red Hat mentions that "the possibility of occasional > > replication cannot be ignored". In QEMU's view, attentiveness and > > oversight are not a practical way to address this; yet as a copyleft > > project, copyright and code provenance are of utmost importance to us. > > > > Therefore, it remains prudent to only permit AI assistance where the > > ramifications of copyright violations are at least easy to revert and > > unlikely to spread: tests, documentation, mechanical changes, and small > > bug fixes. Core code that other things depend on, and that cannot > > simply be thrown away once a problem is noticed long after the fact, > > stays off-limits without prior agreement from a maintainer. > > The interaction of "small bug fixes" and "core code" doesn't > fit well IMHO. A "bug fix" describes an action, but the code > that is changed is usually a "feature" and will often be a > "core" part of something in QEMU. > > IIUC, by "small bug fixes", what you're actually trying to > express is an acceptance of code that is either > > * unlikely to meet the threshold for copyrightability > * small enough that the consequences of throwing it > away is negligible. > * possibly other aspects ?
tightly coupled to specific state of qemu code and so original. > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > index 65b8f232a08..857588c43ba 100644 > > --- a/docs/devel/code-provenance.rst > > +++ b/docs/devel/code-provenance.rst > > @@ -1,7 +1,7 @@ > > .. _code-provenance: > > > > -Code provenance > > -=============== > > +Code provenance and AI usage > > +============================ > > In retrospect, I wonder if we shouldn't have have "ai-usage.rst" as > a separate doc from the start. While we can hyperlink to sub-titles > via anchors, it would be simpler if we could just point to a doc and > not require scrolling past pages of non-AI text. > > > @@ -288,62 +288,108 @@ content generators below. > > Use of AI-generated content > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > +Risks to open source projects include maintainer burnout from an > > +increased number of contributions, as well as the risk to the project > > +from unintentional inclusion of copyrighted material in the LLM's output. > > +In order to mitigate these risks, the QEMU project currently allows > > +using AI/LLM tools to produce patches in a limited set of scenarios: > > If we're opening the door to AI assisted contribution, then IMHO we > need to write about both the social and technical expectations. > Admittedly that will expand the scope of your proposal here, but > IMHO that's somewhat unavoidable. A significant part of the downsides > of AI-assisted contributions comes from bad social practices, rather > than merely bad technical practices. > > As a general theme, I would like us to emphasize at the start that the > act of collaboration & contribution in QEMU is about the interaction, > trust and relationships between humans, not bots. > > > If someone wants to use tools (LLM based or not) that's a choice, > but the accountability for actions needs to fall on a real human > and there needs to be transparency whenever automation is used. > > This starts from the commit message. A good commit message (and even > more so a good cover letter) describes the intent / thinking behind > the changes. An LLM doesn't think or have intent in its actions, > ergo a human should be driving the authorship of commit mesages / > cover letters, where a non-trivial explanation is needed. > > As reviewers, if we make use of LLM backed tools to respond, then > we need to be transparent about any feedback that came from a bot > rather than from a human. > > As contributors, if a reviewer gives feedback, the contributors > response should be their own rather than just feeding the email > review into a LLM and cut+pasting the LLMs answer back to the > list. > > The identity use to contribute to QEMU should reflect the human's > identify. As previously clarified, this doesn't need to be a real > name, but we don't want LLM agents being given a psuedonym to > pretend to be a human. > > > +**Mechanical changes** > > + If you can use a deterministic tool, it is preferred that you use it > > + and not replace it with AI. If you don't know how to do the change > > + deterministically, you can ask the AI for help. > > > +**Small bug fixes** > > + These should be limited to 20 lines of code or less, not including > > + tests. You are still expected to :ref:`understand and explain your > > changes > > + <write_a_meaningful_commit_message>` and the rationale behind them. > > I think the "20 lines or less" is not going a good job at expressing > the intent behind this point. I'd like us to emphasize between the > "why" of this point, as that helps contributors & reviewers make a > decision of whether a change is "within the spirit" or the rule of > not. > > > > > +**Documentation and code comments** > > + While AI can help draft text, it still requires significant human > > + oversight. Pay attention to the organization and flow of the generated > > + text, and strictly fact-check all technical details as LLMs are prone > > + to being confidently wrong. > > Docs is an area I'm more wary of from the social expectation side rather > than the technical or legal side. I don't feeel like "pay attention to > the organization and flow" really mitigates to the tendancy to production > of vast reams of convincing sounding slop. There's has always been a > problem with docs of well intentioned contributors trying to write about > stuff they don't really understand well enough. IOW they don't necccessarily > have the knowledge to fact check details either. As a maintainer, I've found > that reviewing docs and asking for rewrites can be even more of a burden than > code. IOW, encouraging use of AI for docs, in non-expert hands, has a strong > potential for expanding the burden on maintainers. > > I'd be more comfortable with AI tools for inline API docs, rather than > AI tools for prose under docs/. > > Not sure how to better word this point though ? > > > +**Tests** > > + Note that you must still confirm that each test actually exercises > > + the intended behavior including, for regression tests, that it > > + fails without the code under test and passes for the right reason. > > > > > +If you wish to send large amounts of AI-generated changes, or any other > > +contribution not in the above categories, please get in touch with the > > +maintainer beforehand. These can be treated as experiments, at the > > +discretion of the maintainer and the community, with no obligation > > +to accept them. > > IMHO it should not be at the discretion of individual maintainers to > accept large-scale AI authored changes outside these guidelines. To > quote the commit message rationale > > "Therefore, it remains prudent to only permit AI assistance where > the ramifications of copyright violations are at least easy to > revert and unlikely to spread" > > that does not suggest we should leave it to the discretion of maintainers > to override the guidelines. > > > +**Use of AI does not remove the need for authors to comply with all > > +other requirements for contribution.** In particular, the > > +``Signed-off-by`` label in a patch submission is a statement that > > +the author takes responsibility for the entire contents of the patch, > > +certifying that their patch submission is made in accordance with the > > +rules of the `Developer's Certificate of Origin (DCO) <dco>`. > > > This needs to be stronger language IMHO. The kernel has a more > explicit statement explicitly forbidding agents from adding > Signed-off-by on behalf of the human: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-assistants.rst?id=4bf85afb9f3ecd7c3b5d15a85b0902f8e725cd06#n27 > > "Signed-off-by and Developer Certificate of Origin > ================================================= > > AI agents MUST NOT add Signed-off-by tags. Only humans can legally > certify the Developer Certificate of Origin (DCO). The human submitter > is responsible for: > > * Reviewing all AI-generated code > * Ensuring compliance with licensing requirements > * Adding their own Signed-off-by tag to certify the DCO > * Taking full responsibility for the contribution" > > > I think we should be similarly explicit that a human must take > the action of adding S-o-b - it is not a rubber stamp to be > automated by the AI. > > This should be emphasized in the earlier part of the doc before > the AI usage section where we described S-o-b usage. > > > > +Commit messages for AI-assisted changes > > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > +When AI/LLM tools produce or substantively shape your patch, add an > > "shape your patch" -> "shape the content of the submitted patch" > > as this better excludes the "background" usage mentioned below. > > > +``AI-used-for:`` line before ``Signed-off-by``, as a reminder of your > > +DCO obligations and a guide to reviewers. The text is one or more of > > +``code``, ``tests``, ``docs``, ``research``, possibly followed by an > > +explanation in parentheses: > > > > +.. code-block:: none > > + > > + AI-used-for: tests, docs > > + AI-used-for: code > > + AI-used-for: code (refactoring) > > + AI-used-for: code (prototype) > > + AI-used-for: research > > + > > +``AI-used-for`` should not be included for "background" usage such as > > +autocomplete or obtaining a pre-review of the patch. > > This is an interestng idea that I like much more than Assisted-by, > because it gives more directly useful info to the reviewer, without > turning into free advertizing for commercial vendors. > > > +There is no requirement to include your prompts or summarize the > > +conversation in the commit message or cover letter, but you may do so > > +if you think it helps a reviewer judge the result. For example: > > IMHO we should actively discourage the inclusion of prompts > entirely as it is the wrong information to provide. > > > + > > +**Helpful prompts** > > + These describe concrete constraints or instructions, making it easy for a > > + reviewer to see how the tool's output was guided: > > + > > + * "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > > + function already has a local variable or parameter of type ``struct > > + bb``, use it instead of accessing ``aa.bb``" > > + > > + * "add an implementation of the trait for ``Mutex<T: MyTrait>``; it > > + takes the lock around the calls and forwards to ``T``" > > These examples prompts are just expressing an aspect that should > already have been described in prose in the commit message. We > don't need to classify them as "ai prompts" in a a commit message, > we just need the author to write a useful commit message. > > > +**Unhelpful prompts** > > + These are too generic to provide meaningful context. You can of course > > + use them in the context of a complex interaction with the LLM, but they > > + should not be included in the commit message: > > + > > + * "write user-facing documentation for the new tool" > > + > > + * "write testcases for the new functions" > > Again this is just an illustration of an unhelpful commit message. > Those would be eqally useless in an entirely human authored patch. > Just emphasize the writing of useful commit messages. > > > > +QEMU does *not* use ``Assisted-by``, ``Co-authored-by`` or ``Generated-by`` > > +trailers to indicate AI usage. In particular, it is not necessary to > > +specify the exact AI model or tool used to create the commit. > > "does not use" doesn't imply "forbidden". > > IIUC, tools are liable to add these tags without the contributor > even asking for them. If we don't want to be providing free > advertizing IMHO we should explicitly forbid use of these tags > and validate this in checkpatch.pl > > Also any rules in this respect should be documented earlier in > this file where we outline what tags we use in commit messages, > either instead of, or in addition to, mentioning them under the > AI usage guidelines. > > With regards, > Daniel > -- > |: https://berrange.com ~~ https://hachyderm.io/@berrange :| > |: https://libvirt.org ~~ https://entangle-photo.org :| > |: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
