Re: Discuss/proposal: Update our AI coding policy to "forbid" agents opening PRs (not banning LLM generated-code)

Jens Scheffler Wed, 10 Jun 2026 11:43:46 -0700

Hi,

I was watching the mail train and I think that sounds good. Hope thecheck can be made early e.g. during build info and if possible can we(once setting to DRAFT) kill all successor steps to save CI capacity?

Otherwise I hope we can most constructive, not "Fighting fire with fire"but rather aim to improve agent descriptions to optimize other's tokenbudgets in favor of our requirements. We can not turn back time and needto assume the level of agent contributions will stay forever in future.


Jens

On 10.06.26 08:55, Jarek Potiuk wrote:

Hi everyone,

I’ve spent some time reflecting on all the great points raised here. Our
shared goals are to ensure human ownership and review, keep agents as
helpful assistants rather than sole authors, and reduce the cognitive load
from long AI-generated descriptions.

I really like Shahar's proposal and would love to build on it with a few
suggestions to make our expectations clear and supportive for our human
contributors:

   - Explicit Instructions: Let’s be very open in our templates and
AGENTS.md. We can instruct agents to pause and ask the human to write the
description, noting that this personal touch is essential for the PR to
stay open.
   - Human Review Checkbox: I suggest adding a checkbox: "- [ ] I have
reviewed this code myself." We’ll instruct agents to leave this for the
human to check, ensuring that vital moment of reflection.
   - Instead of copy-pasting (which I find awkward), we can instruct the
agents to use `gh --web`, `--template` (to include the template), and
`--draft` (following Pierre's idea). This creates natural
checkpoints—filling the description, checking the box, clicking submit, and
undrafting—that encourage human involvement.

We should also state the consequences for non-compliance: To keep our queue
healthy, we should use an automated process to close PRs that miss these
steps, with a note explaining how to resubmit them with human input.

All those expectations and closing etc. should be equally applied to all
PRs, including maintainer PRs. This will also allow those of us who use
agents to monitor the process and refine the instructions if we see any
loopholes that agents try to bypass or learn how to circumvent. This will
allow us to continuously improve the process.

I believe this approach balances productivity with the high-quality human
collaboration we all value.

What do you think?

Best regards,

Jarek


On Tue, Jun 9, 2026 at 5:00 PM Shahar Epstein <[email protected]> wrote:

Here's a more concrete suggestion:

Updating the PR template in such a way that:
1. Human summary is now a MUST - at least a oneliner* (or more, depending
on the scope - TBD) that describes the suggested changes written by the
PR's author themselves (without AI assistance).
2. AI summary is optional. However, when included - it MUST be bound within
a collapsible box, mainly to save cognitive load for maintainers and
contributors, but also to encourage human interaction like we used to do
before it all started.
3. PR's author (human) should be the one declaring the AI usage checkbox -
added a short statement of ownership.

Contributors will be instructed to use this template and adhere to the
instructions when creating a PR.
Agents may push branches to forks, but they will be instructed to avoid
creating PRs on their own to the upstream repository, and instead provide
the link for creating the PR using this template (they could suggest an AI
summary, but the contributor should copy and paste it manually to the
collapsible box). Trying to work around that might result in an M&M test
directly in the PR's description (TBD).

Example is available here <https://github.com/apache/airflow/pull/68055> -
I've made HTML comments visible, they will be hidden in the real thing.

Took inspiration for this idea from https://tenbluelinks.org/ , that hides
the AI overview on Google if you're not interested (highly-recommended
btw).

Can we live with that?


Shahar

On Tue, Jun 9, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]> wrote:

I don’t care one way or another about using AI as a tool in CI, that is
secondary to my goal which is to try and do something to make it clear

what

we expect from people wanting to contribute to Airflow, namely:

1. Human involvement.

By submitting a PR you are saying “yes I want to be a member of the
community”. Agents submitting without human interaction go against this.

2. Human ownership.

It is _your responsibility_ as the PR author to follow up on it, address
comments, and request reviews.


I frankly find the AI generated triage comments verbose,  and a waste of
time and pure noise even without the `@` spam.

If the user doesn’t care enough about their own PR to follow up on it:
close it after some time. We don’t need to baby sit them. Nor do I need

yet

more commit email messages to read through.


So how does it sound: It sounds like hell to me and an even bigger waste
of electricity in a climate crisis.

I want to be involved in a community of humans working to build software.
I do not want to see LLMs producing so much output that other people need
LLMs to summarise it, with no humans looking at things.

-ash

On 9 Jun 2026, at 13:18, Jarek Potiuk <[email protected]> wrote:

Why? Because AI “instructions” cannot be trusted. And I am after a

signal

that people are blindly using LLMs without enough human introversion.

But is not that what you are doing? This proposal is about adding

another

AI instruction (just hidden in HTML) - how is that going to help?

You already updated the instructions to not `@` the reviewer here

Indeed, LLMs are not deterministic by nature. But they are improvable.
Through iterations of refinement and adding more guardrails we can

improve

it—and this is exactly why I am running it manually to make it better.

This

is the same as in regular breeze development in the past. Initially,

there

were many small issues - and I remember how you complained about them

and

how unnecessary they seemed—yet we now perfected it over time. Now, it
allows all contributors and maintainers to work much more efficiently

and

lose less time. BTW. Thanks for notifying me; I must strengthen this

one

and see why, as there might be another improvement to implement. This

is

also why we are not "yet" doing CI analysis by AI - because I want to
iterate on it and fix it in the way to know which parts are

deterministic.

I want to do anything and everything to reduce the drive by

contribution

with no human activity. I’m happy to spend my time helping humans, but

if

they are just going to feed that back to an LLM and burn an egregious
amount of carbon: no thank you.

And again I am not sure how the proposal to add that instruction would
address this particular issue? Are you just proposing to add another
instruction for the LLM (or am I wrong?). How does it solve the

problem?

 From what I understand we have two basic proposals here - that

contradict

each other:

* Ash - do not use AI to fight with AI at all
* Amoght, Shahar - use AI in CI

But I think, the triage I am running now shows a third way:

* we use AI to try out and generate triage action and figure out which
parts are practically 100% deterministic and can help with triage (this

is

the stats I am gathering now)
* qe use AI to convert the SKILLS we have into deterministic CI code

that

does those triage steps (no AI used at all at runtime)
* we continue perfecting the manually-triggered AI SKILLS to get more

AI

heuristics that we can turn into deterministic CI code

This seems to fulfill seemingly contradictory expectations that

different

people have in a nice way. I am about to produce stats from the last

run

and was just about to propose this approach.

How does it sound Ash, Amogh, Shahar and others ?

J.


On Tue, Jun 9, 2026 at 12:55 PM Ash Berlin-Taylor <[email protected]>

wrote:

Why? Because AI “instructions” cannot be trusted. And I am after a

signal

that people are blindly using LLMs without enough human introversion.

Want a prime example?

The pr triage skill.

You already updated the instructions to not `@` the reviewer here

https://github.com/apache/airflow-steward/blob/76cfa5e1d2e682b88df5205e9cda396df51a66b6/skills/pr-management-triage/comment-templates.md#reviewer-mention-policy

When a comment's only addressee is the PR author (the

request-author-confirmation, reviewer-ping author-primary, and

review-nudge

author-primary templates), the body references the reviewer without
@-mentioning them

And yet the LLM did it again:
https://github.com/apache/airflow/pull/66633#discussion_r3344849352

@korex-f — A reviewer (@ashb) has requested changes on this PR, so

I've

removed the ready for maintainer review label — the next step is on

your

side. Could you address the review comments (push a fix, or reply

in-thread

explaining why the feedback doesn't apply)? Once addressed, re-request
review from @ashb or re-mark the PR ready and it returns to the

maintainer

queue. Thank you.

And frankly I’m tired of all this shit.

I want to do anything and everything to reduce the drive by

contribution

with no human activity. I’m happy to spend my time helping humans, but

if

they are just going to feed that back to an LLM and burn an egregious
amount of carbon: no thank you.

-ash

On 9 Jun 2026, at 10:38, Jarek Potiuk <[email protected]> wrote:

Hi Ash, Amogh, and Shahar,

Ash, I'm curious to learn more about how the "brown m&m test" differs

from

our current request for agents to identify themselves. Could you help

me

understand the flow and the specific benefits you see? It feels

similar

to

me, but I'd love to hear your perspective in case I'm missing a

nuance.

Regarding the gh pr create --web approach, we included those

instructions

to ensure we meet ASF legal guidelines for Gen-AI headers, and to

support

contributors who might not have Copilot. That said, if you have ideas

on

how to trim the context or improve the templates, we truly appreciate

PRs

that improve them—and many people already have. AGENTS.md is a team

effort,

and we’re always looking for ways to make it better. Let's keep our
collaboration positive as we refine these processes together.

Amogh and Shahar, yep the idea of an validatio step in the CI for
first-time contributions is something we should implement sooner or

later.

I have actually been gathering stats on this for the last two weeks.

I’ve

been preparing to see how manually triggered triage tasks can turn

into

automated ones—I'm gathering stats on when human judgment is needed.

shared some stats about this recently and will continue gathering

them.

The

next step is discussing here what and how we can automate.

Also, the current triage process already uses our Pull Request

criteria

to

pre-classify the PRs and only marks them with "ready for maintainer

review"

if those criteria are met. So, if there are any specific criteria

you’d

like to see added to our "Pull request criteria," PRs are most

welcome

there as well.

Best regards,

Jarek


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Discuss/proposal: Update our AI coding policy to "forbid" agents opening PRs (not banning LLM generated-code)

Reply via email to