On Thu, May 16, 2024 at 05:22:30PM +0100, Daniel P. Berrangé wrote: > There has been an explosion of interest in so called AI code generators > in the past year or two. Thus far though, this is has not been matched > by a broadly accepted legal interpretation of the licensing implications > for code generator outputs. While the vendors may claim there is no > problem and a free choice of license is possible, they have an inherent > conflict of interest in promoting this interpretation. More broadly > there is, as yet, no broad consensus on the licensing implications of > code generators trained on inputs under a wide variety of licenses > > The DCO requires contributors to assert they have the right to > contribute under the designated project license. Given the lack of > consensus on the licensing of AI code generator output, it is not > considered credible to assert compliance with the DCO clause (b) or (c) > where a patch includes such generated code. > > This patch thus defines a policy that the QEMU project will currently > not accept contributions where use of AI code generators is either > known, or suspected. > > This merely reflects the current uncertainty of the field, and should > this situation change, the policy is of course subject to future > relaxation. Meanwhile requests for exceptions can also be considered on > a case by case basis. > > Signed-off-by: Daniel P. Berrangé <berra...@redhat.com> > --- > docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++- > 1 file changed, 49 insertions(+), 1 deletion(-) > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > index eabb3e7c08..846dda9a35 100644 > --- a/docs/devel/code-provenance.rst > +++ b/docs/devel/code-provenance.rst > @@ -264,4 +264,52 @@ boilerplate code template which is then filled in to > produce the final patch. > The output of such a tool would still be considered the "preferred format", > since it is intended to be a foundation for further human authored changes. > Such tools are acceptable to use, provided they follow a deterministic > process > -and there is clearly defined copyright and licensing for their output. > +and there is clearly defined copyright and licensing for their output. Note > +in particular the caveats applying to AI code generators below. > + > +Use of AI code generators > +~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +TL;DR: > + > + **Current QEMU project policy is to DECLINE any contributions which are > + believed to include or derive from AI generated code. This includes > ChatGPT, > + CoPilot, Llama and similar tools** > + > +The increasing prevalence of AI code generators, most notably but not limited > +to, `Large Language Models > <https://en.wikipedia.org/wiki/Large_language_model>`__ > +(LLMs) results in a number of difficult legal questions and risks for > software > +projects, including QEMU. > + > +The QEMU community requires that contributors certify their patch submissions > +are made in accordance with the rules of the :ref:`dco` (DCO). > + > +To satisfy the DCO, the patch contributor has to fully understand the > +copyright and license status of code they are contributing to QEMU. With AI > +code generators, the copyright and license status of the output is > ill-defined > +with no generally accepted, settled legal foundation. > + > +Where the training material is known, it is common for it to include large > +volumes of material under restrictive licensing/copyright terms. Even where > +the training material is all known to be under open source licenses, it is > +likely to be under a variety of terms, not all of which will be compatible > +with QEMU's licensing requirements. > + > +With this in mind, the QEMU project does not consider it is currently > possible > +for contributors to comply with DCO terms (b) or (c) for the output of > commonly > +available AI code generators. > + > +The QEMU maintainers thus require that contributors refrain from using AI > code > +generators on patches intended to be submitted to the project, and will > +decline any contribution if use of AI is either known or suspected. > + > +Examples of tools impacted by this policy includes both GitHub's CoPilot, > +OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less > +well known. > + > +This policy may evolve as the legal situation is clarifed. In the meanwhile, > +requests for exceptions to this policy will be evaluated by the QEMU project > +on a case by case basis. To be granted an exception, a contributor will need > +to demonstrate clarity of the license and copyright status for the tool's > +output in relation to its training model and code, to the satisfaction of the > +project maintainers.
I would definitely want more contributors to pass their comments and commit logs though a grammar checker. It's unclear to me whether the contributors would be required to know whether the checker in question is considered "AI" or not. > -- > 2.43.0