Hello here, I have seen only positive comments and a number of improvements and ideas already added - I am starting a [LAZY CONSENSUS] on attempting to gradually introduce the approach.
J. On Wed, Mar 4, 2026 at 10:13 PM Jarek Potiuk <[email protected]> wrote: > > > I just fear that (soon) if AI costs are put to realistic price levels we > > need to check if contributors still have and get free AI bot access, > > else the idea is melting fast. (Low risk thoug, let's see if this > > happens we need to just change the approach... or look for funding) > > If that happens, we will not have to deal with the problem in the > first place, because it will also be costly for those who create the > slop, not only for us. > > Also - I assess (I will know more when I start doing it and this is > one of the things I am going to track also over time) that ~90% of > filter for now is purely deterministic and FAST - I think the crux of > the solution is not to employ the AI, but to assess as quickly as > possible whether we should look at the PR at all. > > So this change is moslty a change to our process: > > a) maintainers won't look at drafts (firmly) > b) clearly communicate to contributors that this will happen and > specify what they need to do > c) Relentlessly and without hesitation (but with oversight) convert > PRs to drafts when we quickly assess they are bad - and tell aiuthors > how to fix them > > The LLM there is just one of the checks - and LLMs check is fired only > when all other easily and deterministically verifiable criteria are > met. And I do hope we reach the checks with LLM will mostly say "fine" > - because it's very likely that those PRs are **actually** worth > looking at. I think most of our future work as maintainers will be > deciding what we want to accept (or work on) - rather than spending > time assessing code quality nitpicks. For me this is a natural > consequence of what we've always been doing with static code checks. I > do remember times when (even in Airflow) our reviews included comments > about bad formatting and missing licences. Yes, that was the case - up > until we introduced pre-commit (one reason I introduced it, and one of > the first rules was to add licence headers automatically). This grew > to over 170 checks that we don't even have to think about. I see what > we are doing here as the natural next step. > > I am of course exaggerating a bit. I still review AI generated code > and check its quality, asking agents to correct it when it doesn't > meet my standards. In fact, I review it in detail because I learn > something new every time. But I am exaggerating only slightly when > describing the focus I think we as maintainers will need to prioritize > in the future. > > Another thing - ASF is already looking for a sponsor to cover AI usage > for ASF maintainers. I also know at least one company considering > giving free access (under certain conditions - not sponsoring, but > related to what goal the tokens will be used for) to all OSS > maintainers in general in case this will be needed in the future. > > J. > > > On Wed, Mar 4, 2026 at 9:42 PM Jens Scheffler <[email protected]> wrote: > > > > I like the idea and also assume that we can adjust and improve rules and > > expectations over time. > > > > I just fear that (soon) if AI costs are put to realistic price levels we > > need to check if contributors still have and get free AI bot access, > > else the idea is melting fast. (Low risk thoug, let's see if this > > happens we need to just change the approach... or look for funding) > > > > On 04.03.26 08:13, Jarek Potiuk wrote: > > >> Another manual step (and bottleneck) in triaging PRs is that > > >> maintainers > > > will still need to approve CI runs on GitHub. > > > > > > Great point ... and ... it's already handled :) - look at my PR. > > > > > > When - during the triage - the triager will see that workflow approval is > > > needed, my nice little tool will print the diff of the incoming PR on > > > terminal and ask the triager to confirm that there is nothing suspicious > > > and after saying "y" the workflow run will be approved. > > > > > > J. > > > > > > > > > On Wed, Mar 4, 2026 at 3:35 AM Zhe-You Liu <[email protected]> wrote: > > > > > >> Hi all, > > >> > > >> Thanks Jarek for bringing up the auto-triage idea! > > >> Big +1 from me on the “let’s try” decision. > > >> > > >> I really like this feature; it can help avoid copy‑pasting or repeatedly > > >> writing similar instructions for contributors to fix baseline test > > >> failures. > > >> > > >> I had the same thoughts as Wei regarding flaky tests. Having > > >> deterministic > > >> checks or automated comments should be enough to handle flaky test > > >> issues, > > >> and contributors can still reach out on Slack to get their PRs reviewed, > > >> so > > >> this should not be a problem. > > >> > > >> Another manual step (and bottleneck) in triaging PRs is that maintainers > > >> will still need to approve CI runs on GitHub. It doesn’t seem safe to > > >> fully > > >> automate CI approval, as there could still be rare cases where an > > >> attacker > > >> creates a vulnerable PR that logs environment variables during tests. > > >> Even > > >> though we could use an LLM to check for these kinds of vulnerabilities > > >> before approving a CI run, it is still not as safe as a manual review in > > >> most cases (e.g. prompt injection attack). I’m not sure whether anyone > > >> has > > >> a good idea for fully automated PR triaging -- for example, automatically > > >> approving CI, periodically checking test baselines for quality (via the > > >> `breeze pr auto-triage`), re‑approving CI as needed, and continuing this > > >> loop until all CI checks are green. > > >> > > >> Best regards, > > >> Jason > > >> > > >> On Tue, Mar 3, 2026 at 10:48 PM Vincent Beck <[email protected]> wrote: > > >> > > >>> I like the overall strategy, for sure the tool will need continuous > > >>> iterations to handle all the different scenarios. But this is definitely > > >>> needed, the number of open PRs just skyrocketed the last few months, it > > >> is > > >>> very hard/impossible to keep track of everything. > > >>> > > >>> On 2026/03/03 14:39:41 Jarek Potiuk wrote: > > >>>>> > > >>>>> Thanks for bringing this up! Overall, I like this idea, but it's > > >> worth > > >>>>> testing it for a bit before we enforce it, especially the LLM-verify > > >>> part. > > >>>> Oh absolutely. My plan to introduce it is (after the community > > >> hopefully > > >>>> makes an overall "let's try" decision): > > >>>> > > >>>> * The human triager is always in the loop, quickly reviewing comments > > >>> just > > >>>> before they are posted to the user (until we achieve high confidence) > > >>>> * I plan to run it myself as the sole triager for some time to perfect > > >> it > > >>>> and to pay much more attention initially. I will start with smaller > > >>>> groups/areas of code and expand as we go - possibly adding more > > >>> maintainers > > >>>> willing to participate in triaging and testing/improving the tool > > >>>> * See how quickly we can do it on a regular basis - whether we need > > >>> several > > >>>> triagers or perhaps one rotational triager handling all PRs from all > > >>> areas > > >>>> at a time. > > >>>> * Possibly further automate it. My assessment is that we will have 90% > > >> of > > >>>> deterministic "fails"—those we can easily automate without hesitation > > >>> once > > >>>> the process and expectations will be in place. The LLM part is a bit > > >> more > > >>>> nuanced and we can decide after we try. > > >>>> > > >>>>> * The author ensures the PR passes ALL the checks and tests (i.e. > > >>> green). > > >>>>>> It might sometimes mean we have to - even more quickly to `main` > > >>>>> breakages, > > >>>>>> and probably provide some "status" info and exceptions when we know > > >>> main > > >>>>> is > > >>>>>> broken. > > >>>>> Probably, we should exempt some checks that might be flaky? > > >>>>> > > >>>> Yeah - this part is a bit problematic - but we can likely add also an > > >>> easy > > >>>> automated, deterministic check if the failure is happening for others. > > >>>> Sending an automated comment like, "Please rebase now, the issue is > > >>> fixed," > > >>>> to the authors would be super useful when they see unrelated failures. > > >>> This > > >>>> is something we **should** figure out during testing. There will be > > >>> plenty > > >>>> of opportunities :D > > >>>> > > >>>> > > >>>>>> * All PRs that do not meet this requirement will be converted to > > >>> Drafts > > >>>>>> with automated suggestions (reviewed quickly and efficiently by a > > >>>>>> triager) provided to the author on the next steps. > > >>>>> This will be super helpful! I also do it manually from time to time. > > >>>> > > >>>> Yes. I believe converting to Draft is an extremely strong (but fair) > > >>> signal > > >>>> to the author: "Hey, you have work to do.". > > >>>> > > >>>> Also when this is accompanied by an actionable comment like, "Here is > > >>> what > > >>>> you should do and here is the link describing it," it immediately > > >> filters > > >>>> out people who submit PRs without much work. > > >>>> > > >>>> Surely - they might feed the comment into their agent anyway (or it can > > >>>> read it automatically and act). But if our tool is faster and cheaper > > >> and > > >>>> more accurate (because of smart human in the driver's seat) than their > > >>>> tools, we gain an upper hand. > > >>>> And it should be faster - because we only check the expectation rather > > >>> than > > >>>> figuring out what to do, which should be much faster. > > >>>> > > >>>> Then in the worst case we will have continuous ping-pong (Draft -> > > >>> Undraft > > >>>> -> Draft), but we will control how fast this loop runs. Generally, our > > >>> goal > > >>>> should be to slow it down rather than respond immediately; for example, > > >>>> running it daily or every two days is a good idea. > > >>>> > > >>>> Effectively, if the PR is in the "ready for maintainer review" state, > > >> the > > >>>> maintainer should be quite certain, that the code quality, tests, etc., > > >>> are > > >>>> all good. Only then should they take a look (and they can immediately > > >>> say, > > >>>> "No, this is not what we want")—and this is absolutely fine as well. We > > >>>> should not optimize for contributors spending time on work we might not > > >>>> accept. This is deliberately not a goal for me. This will automatically > > >>>> mean that new contributors who want to contribute significant changes > > >>> will > > >>>> mostly waste a lot of time and their PRs will be rejected. > > >>>> > > >>>> This is largely what we are already doing, mostly because those PRs do > > >>> not > > >>>> follow our "tribal knowledge," which the agent cannot easily derive. > > >>>> Naturally new contributors should start with small, easy-to-complete > > >>> tasks. > > >>>> that can be easily discarded if reviewers reject them. This is what we > > >>>> always asked people to start with. So this approach with the triage > > >> tool, > > >>>> also largely supports this: someone new rewriting the proverbial > > >>> scheduler > > >>>> will have to spend significant time ensuring "auto-triage" passes, only > > >>> to > > >>>> have the idea completely rejected by the reviewer or be asked for a > > >>>> complete rewrite. And this is perfectly fine. We always encouraged > > >>>> newcomers to start with small tasks, learn the basics, and "grow" until > > >>>> they were ready to propose bigger changes or split it into much smaller > > >>>> chunks. With "auto-triage" this will be natural and expected, requiring > > >>>> authors to invest more time and effort to reach the "ready for review" > > >>>> status. > > >>>> > > >>>> And I think it's absolutely fair and restores the balance we so much > > >> need > > >>>> now. > > >>>> > > >>>> > > >>>>> > > >>>>> Best, > > >>>>> Wei > > >>>>> > > >>>>>> On Mar 3, 2026, at 9:34 PM, Jarek Potiuk <[email protected]> wrote: > > >>>>>> > > >>>>>> *TL;DR; I propose a stricter (automation-assisted) approach for the > > >>>>> "ready > > >>>>>> for review" state and clearer expectations for contributors > > >> regarding > > >>>>> when > > >>>>>> maintainers review PRs of non-collaborators.* > > >>>>>> > > >>>>>> Following the > > >>>>>> https://lists.apache.org/thread/8tzwwwd7jmtmfo4j9pzg27704g10vpr4 > > >>> where I > > >>>>>> showcased a tool that I claude-coded, I would like to have a > > >>> (possibly > > >>>>>> short) discussion on this subject and reach a stage where I can > > >>> attempt > > >>>>> to > > >>>>>> try the tool out. > > >>>>>> > > >>>>>> *Why? * > > >>>>>> > > >>>>>> Because we maintainers are overwhelmed and burning out, we no > > >> longer > > >>> see > > >>>>>> how our time invested in Airflow can bring significant returns to > > >> us > > >>>>>> (personally) and the community. > > >>>>>> > > >>>>>> While some of us spend a lot of time reviewing, commenting on, and > > >>>>> merging > > >>>>>> code, with the current rate of AI-generated PRs and other things we > > >>> do, > > >>>>>> this is not sustainable. Also there is a mismatch—or lack of > > >>>>>> clarity—regarding the quality expectations for the PRs we want to > > >>> review. > > >>>>>> *Social Contract Issue* > > >>>>>> > > >>>>>> We are a good (I think) open source project with a thriving > > >> community > > >>>>> and a > > >>>>>> great group of maintainers who are also friends and like to work > > >> with > > >>>>> each > > >>>>>> other but also are very open to bringing new community members in. > > >> As > > >>>>>> maintainers, we are willing to help new contributors grow and > > >>> generally > > >>>>>> willing to spend some of our time doing so. This is the social > > >>> contract > > >>>>> we > > >>>>>> signed up for as OSS maintainers and as committers for the Apache > > >>>>> Software > > >>>>>> Foundation PMC. Community Over Code. > > >>>>>> > > >>>>>> However, this social contract - this community-building aspect is > > >>>>> currently > > >>>>>> heavily imbalanced because AI-generated content takes away time, > > >>> focus > > >>>>> and > > >>>>>> energy from the maintainers. Instead of having meaningful > > >>> discussions in > > >>>>>> PRs about whether changes are needed and communicating with people, > > >>> we > > >>>>>> start losing time talking to - effectively - AI agents about > > >>> hundreds of > > >>>>>> smaller and bigger things that should not be there in a first > > >> place. > > >>>>>> Currently - collaboration and community building suffer. Even if > > >> real > > >>>>>> people submit code generated by agents (which is becoming really > > >>> good, > > >>>>> fast > > >>>>>> and cheap to produce), we simply lack the time as maintainers to > > >> have > > >>>>>> meaningful conversations with the people behind those agents. > > >>>>>> > > >>>>>> Sometimes we lose time talking to agents. Sometimes we lose time on > > >>>>> talking > > >>>>>> to people who have 0 understanding of what they are doing and > > >> submitt > > >>>>>> continuous crap, and we should not be having that conversation at > > >>>>>> all. Sometimes, we just look at the number of PRs opened in a given > > >>> day > > >>>>> in > > >>>>>> despair, dreading even trying to bring order to them. > > >>>>>> > > >>>>>> And many of us also have some "work" to do or a "feature" to work > > >> on > > >>> top > > >>>>> of > > >>>>>> that. > > >>>>>> > > >>>>>> I think we need to reclaim the maintainers' collective time to > > >> focus > > >>> on > > >>>>>> what matters: delegating more responsibility to authors so they > > >> meet > > >>> our > > >>>>>> expected quality bar (and efficiently verifying it with tools > > >> without > > >>>>>> losing time and focus). > > >>>>>> > > >>>>>> *What do we have now?* > > >>>>>> > > >>>>>> We have already done a lot to help with it - AGENTS.The PR > > >>> guidelines, > > >>>>>> overhauled by Kaxil and updated by others, will certainly help > > >>> clarify > > >>>>>> expectations for agents in the future. I know Kaxil is also > > >>> exploring a > > >>>>> way > > >>>>>> to enable automated copilot code reviews in a manner that will not > > >>> be too > > >>>>>> "dehumanizing" and will work well. This is all good. The better the > > >>>>> agents > > >>>>>> people use and the more closely they follow those instructions, the > > >>>>> higher > > >>>>>> the quality of incoming PRs will be. But we also need to help > > >>> maintainers > > >>>>>> easily identify what to focus on—distinguishing work in progress > > >> and > > >>>>>> unfinished PRs that need work from those truly "Ready for (human) > > >>>>> review." > > >>>>>> *How?* > > >>>>>> > > >>>>>> My proposal has two parts: > > >>>>>> > > >>>>>> * Define and communicate expectations for PRs that maintainers can > > >>>>> manage. > > >>>>>> * Relentlessly automate it to ensure expectations are met and that > > >>>>>> maintainers can easily focus on those PRs that "Ready for review." > > >>>>>> > > >>>>>> My tool (needs a bit more fine-tuning and refinement): > > >>>>>> https://github.com/apache/airflow/pull/62682 `*breeze pr > > >>> auto-triage*` > > >>>>> is > > >>>>>> designed to do exactly this: automate those expectations by > > >>> auto-triaging > > >>>>>> the PRs. It not only converts them to Draft when they are not yet > > >>> "Ready > > >>>>>> For Review," but also provides actionable, automated > > >> (deterministic + > > >>>>> LLM) > > >>>>>> comments to the authors. A concrete maintainer (the current > > >> triager) > > >>> is > > >>>>>> using the tool very efficiently. > > >>>>>> > > >>>>>> *Proposed expectations (for non-collaborators):* > > >>>>>> > > >>>>>> Those are not "new" expectations. Really, I'm proposing we > > >> completely > > >>>>>> delegate the responsibility for fulfilling those expectations to > > >> the > > >>>>> author > > >>>>>> (with helpful, automated comments - reviewed and confirmed by a > > >> human > > >>>>>> triager for now). And simply be very clear that generally no > > >>> maintainer > > >>>>>> will look at a PR until: > > >>>>>> > > >>>>>> * The author ensures the PR passes ALL the checks and tests (i.e. > > >>> green). > > >>>>>> It might sometimes mean we have to - even more quickly to `main` > > >>>>> breakages, > > >>>>>> and probably provide some "status" info and exceptions when we know > > >>> main > > >>>>> is > > >>>>>> broken. > > >>>>>> > > >>>>>> * The author follows all PR guidelines (LLM-verified) regarding > > >>>>>> description, content, quality, and presence of tests. > > >>>>>> > > >>>>>> * All PRs that do not meet this requirement will be converted to > > >>> Drafts > > >>>>>> with automated suggestions (reviewed quickly and efficiently by a > > >>>>>> triager) provided to the author on the next steps. > > >>>>>> > > >>>>>> * Drafts with no activity will be more aggressively pruned by our > > >>>>> stalebot. > > >>>>>> The triager is there mostly to quickly assess and generate > > >>> comments—with > > >>>>>> tool/AI assistance. The triager won't be the one who actually > > >> reviews > > >>>>> those > > >>>>>> PRs when they are "ready for review." > > >>>>>> > > >>>>>> * Only after that do we mark the PR as "*ready for maintainer > > >>> review*" > > >>>>>> (label) > > >>>>>> > > >>>>>> * Only such PRs should be reviewed and it is entirely up to the > > >>> author to > > >>>>>> make them ready. > > >>>>>> > > >>>>>> Note: This approach is only for non-collaborators. For > > >>> collaborators: we > > >>>>>> might have just one expectation - mark your PR with "ready for > > >>> maintainer > > >>>>>> review" when you think it's ready. > > >>>>>> We accept people as committers and collaborators because we already > > >>> know > > >>>>>> they generally know and follow the rules; automating this step > > >> isn't > > >>>>>> necessary. > > >>>>>> > > >>>>>> This is nothing new; we've already been doing this with humans > > >>> handling > > >>>>> all > > >>>>>> the heavy lifting without much of strictness or organization, but > > >>> this is > > >>>>>> no longer sustainable. > > >>>>>> > > >>>>>> I propose we make the expectations explicit, communicate them > > >>> clearly, > > >>>>> and > > >>>>>> relentlessly automate their execution. > > >>>>>> > > >>>>>> I would love to hear what y'all think. > > >>>>>> > > >>>>>> J. > > >>>>> > > >>>>> --------------------------------------------------------------------- > > >>>>> To unsubscribe, e-mail: [email protected] > > >>>>> For additional commands, e-mail: [email protected] > > >>>>> > > >>>>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: [email protected] > > >>> For additional commands, e-mail: [email protected] > > >>> > > >>> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
