Hi Mark,

On Mon, Dec 15, 2025 at 12:25 PM Mark Wielaard <[email protected]> wrote:
> On Thu, 2025-12-11 at 23:35 -0500, Aaron Merey wrote:
> > I'd like to propose an elfutils policy for contributions containing
> > content generated by LLM or AI tools (AI-assisted contributions).  A
> > written policy will help clarify for contributors whether elfutils
> > accepts AI-assisted contributions and whether any special procedures
> > apply.
>
> I think it would be good to differentiate between LLM generated
> contributions and [AI] tool assisted contributions. The first seems
> easy to define and is about whether or not to accept such generated
> patches. While the later seems a very broad topic that mostly is what
> tools a developer might use personally, most of which we don't need a
> policy for.

That's a fair distinction.  The policy can be reworded so that it
addresses contributions containing LLM-generated content (beyond
accessibility aids) instead of AI tooling in general.

> > There isn't a consensus across major open source projects on whether
> > AI-assisted contributions should be allowed.  For example, Binutils
> > [1], Gentoo [2], and man-pages [3] have adopted policies rejecting
> > most or all AI-assisted contributions.
>
> There have also been discussions by glibc and gcc to adopt a similar
> policy as binutils has on LLM Generated Content.
>
> > Fedora [4] and the Linux Foundation [5] have policies permitting the
> > use of AI-assisted contributions.  Contributors are expected to
> > disclose the use of any AI tools and take responsibility for the
> > contribution's quality and license compatibility.
>
> The Fedora one is for a large part not about using AI for code
> contributions. The Linux Foundation one lets each developer try to
> figure out if there are (legal) issues or not. Both feel like they are
> not really giving any real guidance but let every individual try to
> figure it out themselves.

What stood out to me was that these policies do not unconditionally
ban contributions containing LLM content.  This content may be
acceptable when there is disclosure, license compatibility, and
absence of incompatible third party content.  Of course the concern
for maintainers is how do we know when all these conditions are met?

The point I want to make is that this is a type of risk we already
manage with human-authored content.  We rely on good-faith contributor
attestations and a review process that can point out content that is
suspect.  If a patch is highly questionable or too burdensome to
review then we can refuse it.  If a contribution is found to be
infringing after being accepted then we revert it.

> > In my opinion, elfutils should permit AI-assisted contributions.  As
> > for specific policies, I suggest the following.
> >
> > (1) AI-assisted contributions should include a disclosure that some or
> > all of the contribution was generated using an AI tool.  The git
> > commit tag "Assisted-by:" has been adopted for this purpose by Fedora,
> > for instance.
>
> I think this is too weak. The tag or comment should at least explain
> how to replicate the generated content. Which isn't very practical with
> the current generation of LLM chatbots. Or probably even impossible. I
> do think it is appropriate for deterministic tooling though, so as to
> have a recipe to replicate specific code changes.

Reproduction steps for deterministic tools and prompts or conversation
summaries for LLMs are fine with me.

I want to note that reproducibility isn't always required when we
accept a patch. Of course not all human-authored changes are based on
a process that's reproducible in practice and I don't think we need to
introduce this requirement just for LLM content.

> > (2) AI-assisted contributions should otherwise be treated like any
> > other contribution.  The contributor vouches for the quality of their
> > contribution and verifies license compatibility with their DCO
> > "Signed-off-by:" tag while reviewers evaluate the technical merits of
> > the contribution.
>
> Yes, but I think this just says no such contributions can have a
> Signed-off-by tag since, at least for LLM chatbot like generated
> patches, have unclear copyright status and so a contributor cannot

ChatGPT, for example, includes the following statement in its terms of use [1]

"Ownership of content. As between you and OpenAI, and to the extent
permitted by applicable law, you (a) retain your ownership rights in
Input and (b) own the Output. We hereby assign to you all our right,
title, and interest, if any, in and to Output. ... Our assignment
above does not extend to other users’ output or any Third Party
Output."

If a contributor uses ChatGPT to help prepare a patch and takes
reasonable care to avoid including third party content, I think the
contributor can reasonably sign the DCO in this case.  There is valid
disagreement about this of course.  Projects such as QEMU [2] have
policies rejecting LLM content due to the uncertainty of DCO claims.
On the other hand, Chris Wright and Richard Fontana [3] argue that the
DCO can be compatible with LLM content.

For elfutils we already handle a baseline level of uncertainty
regarding DCO claims and I believe LLM content generally fits within
this existing risk profile.

> > (3) Maintainers may reject contributions at their discretion.
> > Rejection can occur if a contribution is unnecessary, low quality, or
> > creates an excessive review burden, for example.  Maintainer
> > discretion to accept or refuse a contribution has always applied, but
> > it may be worth stating this in the policy.
>
> Yes, this is good.
>
> > I'm interested in hearing what others think about this.  Elfutils has
> > already accepted AI-assisted contributions.
>
> But was that a good idea? Or did that just cause a lot of extra review
> time because of the unclear provenance of those contributions?

The policy's emphasis on maintainer discretion is important here. The
rate at which patches containing "Assisted-by:" are declined may be
higher than average.

> >   This proposal formalizes
> > the status quo and is broadly aligned with the policies of Fedora and
> > the Linux Foundation.
>
> I would lean the other way and adopt a simple policy like the rest of
> the core toolchain projects are adopting to reject LLM generated
> contributions for which the provenance cannot be determined (because
> the training corpus and/or algorithm is unknown).

These provenance concerns are fair, but can they be accommodated by
our existing practices?  We do not verify the provenance of many
human-authored contributions either.  Human-authored contributions can
be based on unreproducible processes and may accidentally include
third party material.  We already manage these risks with reverting
patches as the fallback.

Aaron

[1] https://openai.com/policies/row-terms-of-use/
[2] 
https://www.qemu.org/docs/master/devel/code-provenance.html#use-of-ai-content-generators
[3] 
https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues

Reply via email to