Re: SPIP: Automated Integrity Validation (AIV) Gate for Apache Spark

Jungtaek Lim Tue, 17 Mar 2026 16:10:24 -0700

Personally I would love to ask Vaquar to run the idea against OSS projects
and figure out the value, rather than trying to integrate first and
validate. I do not see a limitation to run the idea without actual
integration - the only issue is the cost, but I hope he can get some help
from his employer if this is ever useful. While it will take multiple
months to collect the useful info from Apache Spark, it shouldn't need
multiple months if it's expanded to so many OSS projects and it will be
much more useful than trying to frame that Apache Spark project would need
this.


On Wed, Mar 18, 2026 at 7:32 AM Holden Karau <[email protected]> wrote:

> I think for now we should probably avoid adding automated closing of
> possible AI PRs, I think we are not as badly impacted (knock on wood) as
> some projects and having a human in the loop for closing is reasonable. If
> we start getting a bunch of seemingly openclaw generated PRs then we can
> revisit this.
>
> On Tue, Mar 17, 2026 at 3:07 PM Jungtaek Lim <[email protected]>
> wrote:
>
>> Maybe my biggest worry for this kind of attempt is the accuracy. If this
>> gives false positives, this will just add overhead on the review phase
>> pushing the reviewer to check the validation manually, which is
>> "additional" overhead. I wouldn't be happy with it if I get another phase
>> in addition to the current review process.
>>
>> We get AI slop exactly because of the accuracy. How is this battle
>> tested? Do you have a proof of the accuracy? Linter failures are almost
>> obvious and there are really rare false positives (at least I haven't seen
>> it), so I don't bother with linter checking. I would bother with an
>> additional process if that does not guarantee (or at least has a sense of)
>> the accuracy.
>>
>> On Wed, Mar 18, 2026 at 6:23 AM vaquar khan <[email protected]>
>> wrote:
>>
>>> Hi Team,
>>>
>>>  Nowadays a really hot topic in all Apache Projects is AI and I wanted
>>> to kick off a discussion around a new SPIP.I've been putting together. With
>>> the sheer volume of contributions we handle, relying entirely on PR
>>> templates and manual review to filter out AI-generated slop is just burning
>>> out maintainers. We've seen other projects like curl and Airflow get
>>> completely hammered by this stuff lately, and I think we need a hard
>>> technical defense.
>>>
>>> I'm proposing the Automated Integrity Validation (AIV) Gate. Basically,
>>> it's a local CI job that parses the AST of a PR (using Python, jAST, and
>>> tree-sitter-scala) to catch submissions that are mostly empty scaffolding
>>> or violate our specific design rules (like missing.stop() calls or using
>>> Await.result).
>>>
>>> To keep our pipeline completely secure from CI supply chain attacks,
>>> this runs 100% locally in our dev/ directory;zero external API calls.  If
>>> the tooling ever messes up or a committer needs to force a hotfix, you can
>>> just bypass it instantly with a GPG-signed commit containing '/aiv skip'.
>>>
>>> I think the safest way to roll this out without disrupting anyone's
>>> workflow is starting it in a non-blocking "Shadow Mode" just to gather data
>>> and tune the thresholds.
>>>
>>> I've attached the full SPIP draft below which dives into all the
>>> technical weeds, the rollout plan, and a FAQ. Would love to hear your
>>> thoughts!
>>>
>>>
>>> https://docs.google.com/document/d/1-PCSq0PT_B45MbXVxkJ_E3GUHvK-8VV6WxQjKSGEh9o/edit?tab=t.0#heading=h.e8ahm4jtqclh
>>>
>>> --
>>> Regards,
>>> Viquar Khan
>>> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
>>> *Book *-
>>> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
>>> *GitBook*-
>>> https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
>>> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
>>> *github*-https://github.com/vaquarkhan/aiv-integrity-gate
>>>
>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://www.fighthealthinsurance.com/?q=hk_email>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>

Re: SPIP: Automated Integrity Validation (AIV) Gate for Apache Spark

Reply via email to