Hi folks, I'm just emailing to solicit opinions on adding a page about AI-generated contributions to the docs. The ASF has its own guidance[1] which is fairly high-level and is mainly concerned with licensing. However, we are seeing more AI generated contributions in which the author doesn't seem to have engaged with the code at all and appears to have no intention of engaging with review comments, and I feel like it would be beneficial to have somewhere in the docs to point to if we close the pull request.
Having guidelines also makes it easier to tell whether a contributor has made any effort to follow them. I experimented with approaches to being transparent about AI use in my own PRs and have an example here, where the changes were needed but the subject matter was a little out of my comfort zone[2] - see resolved comments. I've made a rough draft[3] of what I think could constitute some guidelines, but keen to hear what folks think. Happy to hear thoughts on the wording, whether this belongs in the contributor guide, or if there are concerns I haven't considered. Nic [1] https://www.apache.org/legal/generative-tooling.html [2] https://github.com/apache/arrow/pull/48634 [3] We recognise that AI coding assistants are now a regular part of many developers' workflows and can improve productivity. Thoughtful use of these tools can be beneficial, but AI-generated PRs can sometimes lead to undesirable additional maintainer burden. Human-generated mistakes tend to be easier to spot and reason about, and code review often feels like a collaborative learning experience that benefits both submitter and reviewer. When a PR appears to have been generated without much engagement from the submitter, it can feel like work that the maintainer might as well have done themselves. We are not opposed to the use of AI tools in generating PRs, but recommend the following: - Only take on a PR if you are able to debug and own the changes yourself - Make sure that the PR title and body match the style and length of others in this repo - Follow coding conventions used in the rest of the codebase - Be upfront about AI usage and summarise what was AI-generated - If there are parts you don't fully understand, add inline comments, explaining what steps you took to verify correctness - Reference any sources that guided your changes (e.g. "took a similar approach to #123456") PR authors are also responsible for disclosing any copyrighted materials in submitted contributions, as discussed in the ASF generative tooling guidance: https://www.apache.org/legal/generative-tooling.html If a PR appears to be AI-generated, and the submitter hasn't engaged with the output, doesn't respond to review feedback, or hasn't disclosed AI usage, we may close it without further review.
