Hi Jarek, I want to apologize. Looking back at my last note, I can see why it felt like an advertisement. I was trying to keep the email short to avoid cluttering the list, but by leaving out the details/context, I realized I gave the wrong impression.
To answer your question about who is using this and who wrote it: It is just me. I wrote this as a personal prototype to handle the "AI slop" I was seeing in my own volunteer work. To be clear, I am not a vendor. I’ve been an individual contributor in this ecosystem for years working on projects like Spark, Kafka, and JSR 368 and I simply want to help solve a problem we are all facing, I can see all apache projects are busy in the same discussion or adding AI guidelines. You raised a critical point about sustainability and the risks of relying on "free" AI tools that might disappear or become expensive when the bubble pops. I completely agree. I actually designed my tool with this specifically in mind.it runs primarily on deterministic rules (like AST analysis) and only triggers LLM checks if a sponsored connection is available. I never wanted it to depend on a "free lunch" that might vanish. The tool is already Apache 2.0 licensed, and I am happy to donate it to the community for validation. To be honest, with my limited capacity, I cannot maintain it alone. My hope is that we can work together to prove the concept and eventually share a solution with other ASF projects struggling with this same issue. That is why your suggestion about GitHub Agentic Workflows is the ideal long-term home for this. It leverages the "Enterprise" sponsorship infrastructure the ASF already has, ensuring we own the process and aren't hit with surprise bills later. I would love to collaborate on porting the logic I prototyped, specifically the checks for "hollow" boilerplate and hallucinated imports directly into those native GitHub workflows. That way, we get the validation we need, but it lives 100% within the project's own sustainable infrastructure. Regards, Vaquar Khan *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/ *Book *- https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true *GitBook*-https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/ *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan *github*-https://github.com/vaquarkhan On Tue, Feb 17, 2026 at 4:23 AM Jarek Potiuk <[email protected]> wrote: > First of all Vaquar, > > Let's not dilute the main topic. I think we should avoid advertising new > things on devlist. > That's why I changed the subject to keep it in a separate thread. > > One thing is to have someone who tested and wrote about your tool and > another is to drop-in an advertising-like thing into a merit discussion - > please try to avoid that. I know it's slightly relevant, but this seems a > bit too adverti-sy. > > When it comes to showcasing things, #random on slack- before people get > convinced that this is interesting, it's the right thing to do, I think. > Devlist is not a good place for advertising their own solutions, if we > allow that, soon we will be flooded with people trying to want us to use > something. > > *But... Back to the subject.* > > To all, and to the main subject. Brace for it - it's going to be long, I've > been thinking about it for a long time and discussed it with many people. I > know it might take a while for people to digest, so I do not expect fast > answers, but I think this is the most important thing that we have to > discuss and agree to in the coming months. > > Indeed, some of us are looking at some tools to ease review indeed and I > am sure soon we will want to adopt some approach, so if you could share > some examples based on real PRs asynchronously in #random it would be > helpful to assess it, note that we prefer asynchronous communication - and > ideally in relevant channels > > And maybe that thread is a good thing to discuss if someone already used it > and has some experience with those. I think we can start discussing things > here, but not starting from concrete tools, but first of all starting to > discuss what we would like from whatever we would like to use. > > I personally have a few important generic "properties" of such tools that I > would like to see - and more on "what" we want to achieve with those - long > before we decide "how" we are going to do it. > > 1) *First of all sustainability*: Ideally, tt should be a fully open-source > solution, with known governance and some way of it being backed and > promising sustainability - one that we can sustain without having to rely > on someone subsidizing Agentic AI forever. > > This might be in the form of recurring, long term donation and sponsorship > to the ASF (this is what GitHub does, and AWS with their credits for us for > example). I cannot stress how important it is - because currently AI is > super-cheap, almost free. It's heavily subsidized by VC money but this is > not going to stay like that forever. > > We cannot long-term rely on something that we will have to suddenly start > paying for - because - essentially we have no money as a community. Unless > we have a long term, financially stable stakeholder who is committing to > cover the costs in case they increase, we should not heavily rely on such > tools. We have some reliance on Dosu now - with auto-labeling features, and > that is a bit of a stretch. > > While we do not know when, the AI bubble will pop- this - or other way - > many of the projects will run out of money and will not get more and they > will disappear, those who survive will thrive, but for us relying on > survival of some of those (unless we have an easy way out) is just very > dangerous. > > Currently ASF gets a big, recurring, long-term sponsorship from GitHub. We > have a lot from the, - ASF is on "Enterprise" level plan and we have a lot > of things "For free" as part of that account - with some limits we have to > adhere to, but we can count on it - because ASF does. That's why in every > kind of tool my first question is: "Why will GitHub not do it better > soon?". > > When I look at all kind of review tools my thought is going as a few days > (after GitHub announced it 4 days ago) to > > https://github.blog/ai-and-ml/automate-repository-tasks-with-github-agentic-workflows/ > - which basically allow you to describe any kind of workflows, interacting > with the repo using workflows described in ... wait for it ... markdown > files. > > Yes. When you look at it - we should be able soon - when it gets out of > technical preview - to describe all kinds of behaviours we want from the > workflows via text markdown description - like skill. Can it get simpler > than that ? I don't think so. We will be able to have full control - and > describe in natural language what we want to happen during review. With all > the bells-and-whistles, built in security guardrails and without having to > trust any third parties to implement some of those workflows. > > All that within a long-term-sustainable subscription that ASF has with the > partner that is unlikely to go bust (well if it does we will have to switch > to alternative git hosting anyway). So whenever any other solution is > considered - the first question should be: "How is it sustainable and why > is it better than GitHub and what we can do ourselves with describing what > we want in natural language?". > > > 2)* Human in the loop* - while we want to filter out bad submissions, > without engaging the mental effort of maintainers/triagers we cannot remove > humans from it. We need to make sure humans stay in the loop. > > Any kind of automation should be done very carefully, without posting > something on our behalf to - potentially - human contributors - who > genuinely want to help (with or without AI assistance - that part does not > matter), without human maintainers being involved. When it's deterministic > - fine, but non-deterministic answers produced without human oversight, for > me at least, is a non-go. > > I am tracking security related issues for Agentic workflows and it's > terrifying - if you know SQL injection kind of vulnerabilities, many > agentic workflows are the same but few orders of magnitude works. But > principle is the same - if you take untrusted user input, process it and > produce an output from it without proper validation and guardrails, BAD > things happen. We see it every day in the news. > > So any tool we would like to use has to be security-first, with plenty of > guardrails and it should never, ever publish anything on our behalf that a > human user did not review. We should do everything to make it super easy > and efficient, but IMHO we cannot ever skip the step ... One - because of > security , and two because we should ... > > 3) *Focus on collaboration and people not code* ("community over code"). > > For me, review is not about finding bugs in the code. This is secondary and > mostly addressable already by deterministic static checks and tests. Review > of incoming submissions is not about efficiency in finding bugs, it's all > about human-human interaction. > > At least for us, in the open source, it is important how people think, > whether they respond to our requests, how they communicate and whether they > are going to be good community members. Whether they are self-sufficient > and can do stuff on their own even in the presence of not-perfect > specification and problem description, or whether they require a lot of > hand-holding. > > That's why often in our communication you can see that sometimes we are > "harsh", sometimes we are more receptive and "mentor-y" sometimes we just > plainly ignore things (but this is quite wrong IMHO) and sometimes we are > very direct and blunt. Because the review process is the way our community > is built. > > Code is merely a byproduct of that collaboration and bug finding is not the > only important thing in code review. This is the motto of the ASF > "Community Over Code" - and in the current AI world it is far more > important than ever, because Code is cheap (like super cheap) and Community > is even more difficult to build than it was - because of all the spam and > AI. > > Whatever tools we accept, should be focused on that. This IS the ASF and > Airflow strength and if we lose it, we lose the biggest asset we have - > because again, code is cheap and communities are as difficult to build as > ever - or even more. > > I would love to hear what others have to say about that, but those are mine > (long term thought about and coined that). > > J. > > > > > >
