Re: [DISCUSS] Automated review tools (was: Stop assigning unknown contributors)

vaquar khan Tue, 17 Feb 2026 12:23:07 -0800

Hi Jarek,

I want to apologize. Looking back at my last note, I can see why it felt
like an advertisement. I was trying to keep the email short to avoid
cluttering the list, but by leaving out the details/context, I realized I
gave the wrong impression.


To answer your question about who is using this and who wrote it: It is
just me. I wrote this as a personal prototype to handle the "AI slop" I was
seeing in my own volunteer work. To be clear, I am not a vendor. I’ve been
an individual contributor in this ecosystem for years working on projects
like Spark, Kafka, and JSR 368 and I simply want to help solve a problem we
are all facing, I can  see all apache projects are busy in the
same discussion or adding AI guidelines.

You raised a critical point about sustainability and the risks of relying
on "free" AI tools that might disappear or become expensive when the bubble
pops. I completely agree. I actually designed my tool with this
specifically in mind.it runs primarily on deterministic rules (like AST
analysis) and only triggers LLM checks if a sponsored connection is
available. I never wanted it to depend on a "free lunch" that might vanish.

The tool is already Apache 2.0 licensed, and I am happy to donate it to the
community for validation. To be honest, with my limited capacity, I cannot
maintain it alone. My hope is that we can work together to prove the
concept and eventually share a solution with other ASF projects struggling
with this same issue.

That is why your suggestion about GitHub Agentic Workflows is the ideal
long-term home for this. It leverages the "Enterprise" sponsorship
infrastructure the ASF already has, ensuring we own the process and aren't
hit with surprise bills later.

I would love to collaborate on porting the logic I prototyped, specifically
the checks for "hollow" boilerplate and hallucinated imports directly into
those native GitHub workflows. That way, we get the validation we need, but
it lives 100% within the project's own sustainable infrastructure.


Regards,
Vaquar Khan
*Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
*Book *-
https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
*GitBook*-https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
*Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
*github*-https://github.com/vaquarkhan

On Tue, Feb 17, 2026 at 4:23 AM Jarek Potiuk <[email protected]> wrote:

> First of all Vaquar,
>
> Let's not dilute the main topic. I think we should avoid advertising new
> things on devlist.
> That's why I changed the subject to keep it in a separate thread.
>
> One thing is to have someone who tested and wrote about your tool and
> another is to drop-in an advertising-like thing into a merit discussion -
> please try to avoid that. I know it's slightly relevant, but this seems a
> bit too adverti-sy.
>
> When it comes to showcasing things, #random on slack- before people get
> convinced that this is interesting, it's the right thing to do, I think.
> Devlist is not a good place for advertising their own solutions, if we
> allow that, soon we will be flooded with people trying to want us to use
> something.
>
> *But... Back to the subject.*
>
> To all, and to the main subject. Brace for it - it's going to be long, I've
> been thinking about it for a long time and discussed it with many people. I
> know it might take a while for people to digest, so I do not expect fast
> answers, but I think this is the most important thing that we have to
> discuss and agree to in the coming months.
>
> Indeed, some of us  are looking at some tools to ease review indeed and  I
> am sure soon we will want to adopt some approach, so if you could share
> some examples based on real  PRs asynchronously in #random it would be
> helpful to assess it, note that we prefer asynchronous communication - and
> ideally in relevant channels
>
> And maybe that thread is a good thing to discuss if someone already used it
> and has some experience with those. I think we can start discussing things
> here, but not starting from concrete tools, but first of all starting to
> discuss what we would like from whatever we would like to use.
>
> I personally have a few important generic "properties" of such tools that I
> would like to see - and more on "what" we want to achieve with those - long
> before we decide "how" we are going to do it.
>
> 1) *First of all sustainability*: Ideally, tt should be a fully open-source
> solution, with known governance and some way of it being backed and
> promising sustainability - one that we can sustain without having to rely
> on someone subsidizing Agentic AI forever.
>
> This might be in the form of recurring, long term donation and sponsorship
> to the ASF (this is what GitHub does, and AWS with their credits for us for
> example). I cannot stress how important it is - because currently AI is
> super-cheap, almost free. It's heavily subsidized by VC money but this is
> not going to stay like that forever.
>
> We cannot long-term rely on something that we will have to suddenly start
> paying for - because - essentially we have no money as a community. Unless
> we have a long term,  financially stable stakeholder who is committing to
> cover the costs in case they increase, we should not heavily rely on such
> tools. We have some reliance on Dosu now - with auto-labeling features, and
> that is a bit of a stretch.
>
> While we do not know when, the AI bubble will pop- this - or other way -
> many of the projects will run out of money and will not get more and they
> will disappear, those who survive will thrive, but for us relying on
> survival of some of those (unless we have an easy way out) is just very
> dangerous.
>
> Currently ASF gets a big, recurring, long-term sponsorship from GitHub. We
> have a lot from the, - ASF is on "Enterprise" level plan and we have a lot
> of things "For free" as part of that account - with some limits we have to
> adhere to, but we can count on it - because ASF does. That's why  in every
> kind of tool my first question is: "Why will GitHub not do it better
> soon?".
>
> When I look at all kind of review tools my thought is going as a few days
> (after GitHub announced it 4 days ago) to
>
> https://github.blog/ai-and-ml/automate-repository-tasks-with-github-agentic-workflows/
> - which basically allow you to describe any kind of workflows, interacting
> with the repo using workflows described in ... wait for it ... markdown
> files.
>
> Yes. When you look at it - we should be able soon - when it gets out of
> technical preview - to describe all kinds of behaviours we want from the
> workflows via text markdown description - like skill. Can it get simpler
> than that ? I don't think so. We will be able to have full control - and
> describe in natural language what we want to happen during review. With all
> the bells-and-whistles, built in security guardrails and without having to
> trust any third parties to implement some of those workflows.
>
> All that within a long-term-sustainable subscription that ASF has with the
> partner that is unlikely to go bust (well if it does we will have to switch
> to alternative git hosting anyway). So whenever any other solution is
> considered - the first question should be: "How is it sustainable and why
> is it better than GitHub and what we can do ourselves with describing what
> we want in natural language?".
>
>
> 2)* Human in the loop*  - while we want to filter out bad submissions,
> without engaging the mental effort of maintainers/triagers we cannot remove
> humans from it. We need to make sure humans stay in the loop.
>
> Any kind of automation should be done very carefully, without posting
> something on our behalf to - potentially - human contributors - who
> genuinely want to help (with or without AI assistance - that part does not
> matter), without human maintainers being involved. When it's deterministic
> - fine, but non-deterministic answers produced without human oversight, for
> me at least, is a non-go.
>
> I am tracking security related issues for Agentic workflows and it's
> terrifying - if you know SQL injection kind of vulnerabilities, many
> agentic workflows are the same but few orders of magnitude works. But
> principle is the same - if you take untrusted user input, process it and
> produce an output from it without proper validation and guardrails, BAD
> things happen. We see it every day in the news.
>
> So any tool we would like to use has to be security-first, with plenty of
> guardrails and it should never, ever publish anything on our behalf that a
> human user did not review. We should do everything to make it super easy
> and efficient, but IMHO we cannot ever skip the step ...  One - because of
> security , and two because we should ...
>
> 3) *Focus on collaboration and people not code* ("community over code").
>
> For me, review is not about finding bugs in the code. This is secondary and
> mostly addressable already by deterministic static checks and tests. Review
> of incoming submissions is not about efficiency in finding bugs, it's all
> about human-human interaction.
>
> At least for us, in the open source, it is important how people think,
> whether they respond to our requests, how they communicate and whether they
> are going to be good community members. Whether they are self-sufficient
> and can do stuff on their own even in the presence of not-perfect
> specification and problem description, or whether they require a lot of
> hand-holding.
>
> That's why often in our communication you can see that sometimes we are
> "harsh", sometimes we are more receptive and "mentor-y" sometimes we just
> plainly ignore things (but this is quite wrong IMHO) and sometimes we are
> very direct and blunt. Because the review process is the way our community
> is built.
>
> Code is merely a byproduct of that collaboration and bug finding is not the
> only important thing in code review. This is the motto of the ASF
> "Community Over Code" - and in the current AI world it is far more
> important than ever, because Code is cheap (like super cheap) and Community
> is even more difficult to build than it was - because of all the spam and
> AI.
>
> Whatever tools we accept, should be focused on that. This IS the ASF and
> Airflow strength and if we lose it, we lose the biggest asset we have -
> because again, code is cheap and communities are as difficult to build as
> ever - or even more.
>
> I would love to hear what others have to say about that, but those are mine
> (long term thought about and coined that).
>
> J.
>
> >
> >
>

Re: [DISCUSS] Automated review tools (was: Stop assigning unknown contributors)

Reply via email to