Sure let me update SIP doc with supporting links On Tue, Mar 17, 2026, 4:44 PM Dongjoon Hyun <[email protected]> wrote:
> Hi Viquar, > > Thank you for sharing this. > > While reviewing the SPIP, I noticed that we might need more concrete data > to support the claims regarding the recent surge in the Apache Spark > community, specifically this section: > > > Why Now: The Open Source Automated Contribution Crisis: The open-source > ecosystem is experiencing an unprecedented surge in automated, low-quality > pull requests. This is not a theoretical concern—it is an active, > documented crisis affecting Apache projects and the broader community: > > Apache Spark's Own Data (Verified from Commit History): Spark added a > generative tooling disclosure checkbox to its PR template on August 19, > 2023. Analysis of commit history shows machine-assisted commits > accelerating: 9 in 2024, 23 in 2025, and 35 in just the first 45 days of > 2026. Only ~1-2% of commits currently disclose automated tooling usage, but > disclosure is voluntary and unverifiable; the actual percentage is likely > much higher. > > Just FYI, please note that the recent `Generated-By: ` commits came from > active Apache Spark PMC members (like me, Kent, Yang) mostly. It's because > of the recent promotion from the vendors (like Claude Code OSS program, > Google Antigravity Ultra Plan Discount, and Copilot). It's truly the > productivity enhancements instead of the attack of AI slops. > > Additionally, as a point of context, our community has already taken > proactive measures to safeguard against low-quality AI-generated > contributions. We currently maintain a human-in-the-loop system—such as > requiring an ASF JIRA ticket to be created before submitting a PR—to help > mitigate this issue. > > So, we may want to revisit those topic later with the concrete and massive > examples of AI Slops in the Spark Pull Request list. > > Sincerely, > Dongjoon Hyun > > > On 2026/03/17 21:22:55 vaquar khan wrote: > > Hi Team, > > > > Nowadays a really hot topic in all Apache Projects is AI and I wanted to > > kick off a discussion around a new SPIP.I've been putting together. With > > the sheer volume of contributions we handle, relying entirely on PR > > templates and manual review to filter out AI-generated slop is just > burning > > out maintainers. We've seen other projects like curl and Airflow get > > completely hammered by this stuff lately, and I think we need a hard > > technical defense. > > > > I'm proposing the Automated Integrity Validation (AIV) Gate. Basically, > > it's a local CI job that parses the AST of a PR (using Python, jAST, and > > tree-sitter-scala) to catch submissions that are mostly empty scaffolding > > or violate our specific design rules (like missing.stop() calls or using > > Await.result). > > > > To keep our pipeline completely secure from CI supply chain attacks, this > > runs 100% locally in our dev/ directory;zero external API calls. If the > > tooling ever messes up or a committer needs to force a hotfix, you can > just > > bypass it instantly with a GPG-signed commit containing '/aiv skip'. > > > > I think the safest way to roll this out without disrupting anyone's > > workflow is starting it in a non-blocking "Shadow Mode" just to gather > data > > and tune the thresholds. > > > > I've attached the full SPIP draft below which dives into all the > technical > > weeds, the rollout plan, and a FAQ. Would love to hear your thoughts! > > > > > https://docs.google.com/document/d/1-PCSq0PT_B45MbXVxkJ_E3GUHvK-8VV6WxQjKSGEh9o/edit?tab=t.0#heading=h.e8ahm4jtqclh > > > > -- > > Regards, > > Viquar Khan > > *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/ > > *Book *- > > > https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true > > *GitBook*- > https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/ > > *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan > > *github*-https://github.com/vaquarkhan/aiv-integrity-gate > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: [email protected] > >
