Hi Tian, I have spent a significant amount of time on this proposal and have already shared the link to the Google Doc. Please review it thoroughly rather than making assumptions; it is essential to have a working prototype before proposing any technical solution. I hope you understand.
SIP - https://docs.google.com/document/d/1-PCSq0PT_B45MbXVxkJ_E3GUHvK-8VV6WxQjKSGEh9o/edit?tab=t.0#heading=h.e8ahm4jtqclh Regards, Viquar Khan On Tue, 17 Mar 2026 at 16:57, Tian Gao via dev <[email protected]> wrote: > I guess Vaquar is talking about > https://github.com/vaquarkhan/aiv-integrity-gate , which I assume is a > new project he developed. Forgive me if this feels like a promotion to me. > > Tian > > On Tue, Mar 17, 2026 at 2:52 PM vaquar khan <[email protected]> wrote: > >> Sure let me update SIP doc with supporting links >> >> On Tue, Mar 17, 2026, 4:44 PM Dongjoon Hyun <[email protected]> wrote: >> >>> Hi Viquar, >>> >>> Thank you for sharing this. >>> >>> While reviewing the SPIP, I noticed that we might need more concrete >>> data to support the claims regarding the recent surge in the Apache Spark >>> community, specifically this section: >>> >>> > Why Now: The Open Source Automated Contribution Crisis: The >>> open-source ecosystem is experiencing an unprecedented surge in automated, >>> low-quality pull requests. This is not a theoretical concern—it is an >>> active, documented crisis affecting Apache projects and the broader >>> community: >>> > Apache Spark's Own Data (Verified from Commit History): Spark added a >>> generative tooling disclosure checkbox to its PR template on August 19, >>> 2023. Analysis of commit history shows machine-assisted commits >>> accelerating: 9 in 2024, 23 in 2025, and 35 in just the first 45 days of >>> 2026. Only ~1-2% of commits currently disclose automated tooling usage, but >>> disclosure is voluntary and unverifiable; the actual percentage is likely >>> much higher. >>> >>> Just FYI, please note that the recent `Generated-By: ` commits came from >>> active Apache Spark PMC members (like me, Kent, Yang) mostly. It's because >>> of the recent promotion from the vendors (like Claude Code OSS program, >>> Google Antigravity Ultra Plan Discount, and Copilot). It's truly the >>> productivity enhancements instead of the attack of AI slops. >>> >>> Additionally, as a point of context, our community has already taken >>> proactive measures to safeguard against low-quality AI-generated >>> contributions. We currently maintain a human-in-the-loop system—such as >>> requiring an ASF JIRA ticket to be created before submitting a PR—to help >>> mitigate this issue. >>> >>> So, we may want to revisit those topic later with the concrete and >>> massive examples of AI Slops in the Spark Pull Request list. >>> >>> Sincerely, >>> Dongjoon Hyun >>> >>> >>> On 2026/03/17 21:22:55 vaquar khan wrote: >>> > Hi Team, >>> > >>> > Nowadays a really hot topic in all Apache Projects is AI and I wanted >>> to >>> > kick off a discussion around a new SPIP.I've been putting together. >>> With >>> > the sheer volume of contributions we handle, relying entirely on PR >>> > templates and manual review to filter out AI-generated slop is just >>> burning >>> > out maintainers. We've seen other projects like curl and Airflow get >>> > completely hammered by this stuff lately, and I think we need a hard >>> > technical defense. >>> > >>> > I'm proposing the Automated Integrity Validation (AIV) Gate. Basically, >>> > it's a local CI job that parses the AST of a PR (using Python, jAST, >>> and >>> > tree-sitter-scala) to catch submissions that are mostly empty >>> scaffolding >>> > or violate our specific design rules (like missing.stop() calls or >>> using >>> > Await.result). >>> > >>> > To keep our pipeline completely secure from CI supply chain attacks, >>> this >>> > runs 100% locally in our dev/ directory;zero external API calls. If >>> the >>> > tooling ever messes up or a committer needs to force a hotfix, you can >>> just >>> > bypass it instantly with a GPG-signed commit containing '/aiv skip'. >>> > >>> > I think the safest way to roll this out without disrupting anyone's >>> > workflow is starting it in a non-blocking "Shadow Mode" just to gather >>> data >>> > and tune the thresholds. >>> > >>> > I've attached the full SPIP draft below which dives into all the >>> technical >>> > weeds, the rollout plan, and a FAQ. Would love to hear your thoughts! >>> > >>> > >>> https://docs.google.com/document/d/1-PCSq0PT_B45MbXVxkJ_E3GUHvK-8VV6WxQjKSGEh9o/edit?tab=t.0#heading=h.e8ahm4jtqclh >>> > >>> > -- >>> > Regards, >>> > Viquar Khan >>> > *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/ >>> > *Book *- >>> > >>> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true >>> > *GitBook*- >>> https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/ >>> > *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan >>> > *github*-https://github.com/vaquarkhan/aiv-integrity-gate >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: [email protected] >>> >>>
