On Fri, Jun 05, 2026 at 05:25:36AM -0400, Michael S. Tsirkin wrote: > On Fri, Jun 05, 2026 at 10:17:16AM +0100, Daniel P. Berrangé wrote: > > On Thu, Jun 04, 2026 at 12:37:58PM +0200, Paolo Bonzini wrote: > > > Il mer 3 giu 2026, 19:54 Daniel P. Berrangé <[email protected]> ha > > > scritto: > > > > > > > The AI policy should just > > > > make a point that we expect to be communicating with people not > > > > bots pretending to be people. > > > > > > > > > > Yes, it's better to have that stated clearly. > > > > > > > True but we also need a rule. The spirit is better explained elsewhere > > > > > (and also, building consensus on spirit vs. a rule are two different > > > > > things). > > > > > > > > Do we have a better elsewhere in this case ? It is a point specifically > > > > about intent of the AI policy rule. > > > > > > > > > The rule in this draft says 20 lines, tests, mechanical changes and docs. > > > The spirit is what is in the commit message, basically to maximize the > > > benefit and limit the possible damage? > > > > Putting "the spirit" in the commit message is essentially /dev/null to > > anyone reading the policy later. > > > > > > See my reply to Peter elsewhere in the thread. I agree with your > > > > > concerns for both docs and discretion, but I had specific uses in mind > > > > > that I'd like to allow. > > > > > > > > > > For docs: > > > > > - create tutorials and/or feature documentation based on functional > > > > > tests > > > > > > > > That doesn't sound too appealing to me. Reverse engineering docs or > > > > tutorials from our functional tests is exactly the kind of thing that > > > > feels > > > > likely to result in volumous text of marginal value which will have a > > > > large > > > > burden on reviewers. > > > > > > > > > > At the same time this can be helpful for maintainers themselves? Let's > > > also > > > look at this from the point of view of producing better output, not just > > > from that of being on the receiving end of slop. Especially for docs I > > > have > > > a hard time imagining people sending out whole new "manuals"... The > > > bugfixes rule ironically seems the most dangerous to me from the > > > Dunning-Krueger point of view. > > > > > > My question is: do we want disclosure for anything is created with the > > > help > > > of LLMs, even if only small parts survive untouched? I think so, because a > > > lot more, even if edited, would still be originally from AI. But then it's > > > important to have rules allowing it and a way to track it. > > > > IMHO need unconditional disclosure, because the use of the LLM impacts > > the license of the code. QEMU is traditionally expected to be GPLv2+ > > licensed for all new code, but there's the train of thought that LLM > > code is public domain. > > If it gets human editting afterwards we can > > consider that the human edits are GPLv2+ licensed, but IMHO we still > > want to know the origins. > > Wait that's a big ask. > > DOC explicitly does not ask if code might be available anywhere else > under any other license. Just that contributor can contribute under GPL. > If it's public domain then the human can license is under GPL.
For new files, in checkpatch we validate that SPDX-License-Identifier is explicitly set as GPL-2.0-or-later. Contributors are expected to justify any divergence in the commit message. I've seen guidance that SPDX-License-Identifier for AI output code should NOT state a license, under the theory it is public domain. If it is human editted though, I would expect it to overrule this guidance and explicitly state GPL-2.0-or-later in the SPDX tag unless the contributor wants to explicitly put their own edits under public domain too. Ultimately QEMU is a copyleft project as a whole and IMHO we should prioritize retaining that for as large a portion of the codebase is is practical. > > > It would definitely be intended for merge. There's a lot of boilerplate > > > code in the Rust bindings, for example, that is voluminous but *mostly* > > > lacks creativity---the creative part basically can be described by the > > > spec/docs and should already clear the low bar required for originality, > > > even if the code is automatically generated. I included a couple examples > > > in my reply to Peter. > > > > So we know there are examples which are probably low risk from a license > > POV, but which are massively larger than 20 lines of code. This just > > makes me more uncomfortable with the 20 line rule as the definition of > > the policy - we know that rule is wrong / undesirable from the start and > > needs this exception to make it viable. > > So 20 lines or mechanical changes? what is considered mechanical will be > decided by maintainers, contributor should check with them up front. If we are wanting to allow mechanical changes / boilerplate, then we should express that in the policy such that the policy can be reasonably understood without having to ask permission / questions ahead of time. With regards, Daniel -- |: https://berrange.com ~~ https://hachyderm.io/@berrange :| |: https://libvirt.org ~~ https://entangle-photo.org :| |: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
