Sure let me update SIP doc with supporting links

On Tue, Mar 17, 2026, 4:44 PM Dongjoon Hyun <[email protected]> wrote:

> Hi Viquar,
>
> Thank you for sharing this.
>
> While reviewing the SPIP, I noticed that we might need more concrete data
> to support the claims regarding the recent surge in the Apache Spark
> community, specifically this section:
>
> > Why Now: The Open Source Automated Contribution Crisis: The open-source
> ecosystem is experiencing an unprecedented surge in automated, low-quality
> pull requests. This is not a theoretical concern—it is an active,
> documented crisis affecting Apache projects and the broader community:
> > Apache Spark's Own Data (Verified from Commit History): Spark added a
> generative tooling disclosure checkbox to its PR template on August 19,
> 2023. Analysis of commit history shows machine-assisted commits
> accelerating: 9 in 2024, 23 in 2025, and 35 in just the first 45 days of
> 2026. Only ~1-2% of commits currently disclose automated tooling usage, but
> disclosure is voluntary and unverifiable; the actual percentage is likely
> much higher.
>
> Just FYI, please note that the recent `Generated-By: ` commits came from
> active Apache Spark PMC members (like me, Kent, Yang) mostly. It's because
> of the recent promotion from the vendors (like Claude Code OSS program,
> Google Antigravity Ultra Plan Discount, and Copilot). It's truly the
> productivity enhancements instead of the attack of AI slops.
>
> Additionally, as a point of context, our community has already taken
> proactive measures to safeguard against low-quality AI-generated
> contributions. We currently maintain a human-in-the-loop system—such as
> requiring an ASF JIRA ticket to be created before submitting a PR—to help
> mitigate this issue.
>
> So, we may want to revisit those topic later with the concrete and massive
> examples of AI Slops in the Spark Pull Request list.
>
> Sincerely,
> Dongjoon Hyun
>
>
> On 2026/03/17 21:22:55 vaquar khan wrote:
> > Hi Team,
> >
> >  Nowadays a really hot topic in all Apache Projects is AI and I wanted to
> > kick off a discussion around a new SPIP.I've been putting together. With
> > the sheer volume of contributions we handle, relying entirely on PR
> > templates and manual review to filter out AI-generated slop is just
> burning
> > out maintainers. We've seen other projects like curl and Airflow get
> > completely hammered by this stuff lately, and I think we need a hard
> > technical defense.
> >
> > I'm proposing the Automated Integrity Validation (AIV) Gate. Basically,
> > it's a local CI job that parses the AST of a PR (using Python, jAST, and
> > tree-sitter-scala) to catch submissions that are mostly empty scaffolding
> > or violate our specific design rules (like missing.stop() calls or using
> > Await.result).
> >
> > To keep our pipeline completely secure from CI supply chain attacks, this
> > runs 100% locally in our dev/ directory;zero external API calls.  If the
> > tooling ever messes up or a committer needs to force a hotfix, you can
> just
> > bypass it instantly with a GPG-signed commit containing '/aiv skip'.
> >
> > I think the safest way to roll this out without disrupting anyone's
> > workflow is starting it in a non-blocking "Shadow Mode" just to gather
> data
> > and tune the thresholds.
> >
> > I've attached the full SPIP draft below which dives into all the
> technical
> > weeds, the rollout plan, and a FAQ. Would love to hear your thoughts!
> >
> >
> https://docs.google.com/document/d/1-PCSq0PT_B45MbXVxkJ_E3GUHvK-8VV6WxQjKSGEh9o/edit?tab=t.0#heading=h.e8ahm4jtqclh
> >
> > --
> > Regards,
> > Viquar Khan
> > *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
> > *Book *-
> >
> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
> > *GitBook*-
> https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
> > *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
> > *github*-https://github.com/vaquarkhan/aiv-integrity-gate
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Reply via email to