I think that Calcite (and the wider ASF) needs to embrace AI — to do otherwise will make us irrelevant — but we need to avoid some of the excesses that will make the project impossible to maintain.
The enemies are slop (large volumes of code that is of dubious value) and inconsistency (code whose design contradicts the logic elsewhere in the project). I agree with Mihai’s principle — allow a contribution as long as "the result is arguably correct and reviewers that we trust can understand it” — and note that it is basically a Turing test for good programmers. We know that good programmers produce clear, concise, understandable code, and bad programmers don’t. Our scarcest resource is reviewer time. I would like to see reviewers push back hard on PRs that are too verbose, are not consistent with how Calcite does things, and do not clearly say the problem that they are trying to fix. If a PR is unfocused, reviewers should just ignore it. The person who submitted the PR — yes, legally, it’s always a person that submits a PR — needs to make it fit for review. I strongly believe that we should require a Jira case, with a good summary and description, before we read a single line of a PR’s code. Last, I think vibe-coding will be increasingly important. In vibe-coding, the code is written by AI and is never reviewed by a human, but we trust it because there is a comprehensive extensive specification and tests, written by humans. I propose that we have “vibe-coded” components in Calcite, where we put extra effort into reviewing tests and specification, and less effort into reviewing code. In Calcite, SQL functions and SQL dialects would be good candidates. One of the benefits of vibe-coding, in this manner, is that from one test suite you can create and maintain implementations in multiple languages. (In my Morel project, I was easily able to port Morel’s standard library from Java to Rust.) My vision is that five years from now, Calcite will be in multiple languages — say Java, Rust and Python — with a large, healthy, efficient test suite, and about half of the non-test code is either vibe-coded or translated by AI from the original code in Java. Julian > On Jan 12, 2026, at 12:49 AM, Stamatis Zampetakis <[email protected]> wrote: > > It depends how and to what extent AI is used in a contribution so we > probably have to make a decision on a case by case basis. Note that > ASF already provides some guidelines on how/when AI can be used [1]. > Obviously, using AI for generating lots of "new" code is quite risky > and quite impractical to verify copyrights so it should be avoided. > > [1] https://www.apache.org/legal/generative-tooling.html > > On Mon, Jan 12, 2026 at 6:12 AM Dmitry Sysolyatin > <[email protected]> wrote: >> >> If AI is used to search for answers to project-related questions (although >> one should be careful here when there is a lot of legacy), for >> self-validation, to help find a solution, or for translating from one >> language to another (specifically a 1-to-1 translation), I don’t see >> anything wrong with that. >> >> However, I am quite skeptical about using it to implement solutions. This >> is up to each individual developer whether they use it or not as long as it >> is not clearly visible that the code (which is sometimes very obvious) or >> the comment is AI-generated (by “generation” I mean not translating one’s >> own text from one language to another 1-to-1, but actual generation). In >> such cases, it becomes unclear whether the developer actually understands >> what they have written, and whether it is worth continuing the review, the >> discussion, and spending time on it. >> >> In the case of Apache Calcite, I have seen only once such a case. But in >> other projects, AI-generated issues and fixes sometimes reach the point of >> absurdity. >> >> On Mon, Jan 12, 2026 at 1:08 AM jensen <[email protected]> wrote: >> >>> Personally, I think using AI tools has its advantages; they often help us >>> quickly locate simple problems. For the Calcite community, we have many >>> experienced reviewers, and as long as we don't completely rely on AI tools >>> to review code, I think it's acceptable. As for contributors, it's best to >>> explain their thought process behind the changes (or provide good code >>> comments), and ideally, to demonstrate whether the changes are reasonable >>> (of course, new contributors may not be able to confirm the reasonableness >>> of their changes even without using AI). If these things can be done to a >>> certain extent, it will reduce the time and effort reviewers need to put in. >>> >>> >>> >>> Best regards, >>> >>> Zhen Chen >>> >>> ---- Replied Message ---- >>> | From | Mihai Budiu<[email protected]> | >>> | Date | 1/12/2026 06:03 | >>> | To | [email protected]<[email protected]> | >>> | Subject | Re: AI/LLM and Calcite contributions | >>> I personally do not care which tools have been used as long as the result >>> is arguably correct and reviewers that we trust can understand it. >>> >>> Mihai >>> >>> ________________________________ >>> From: Alessandro Solimando <[email protected]> >>> Sent: Sunday, January 11, 2026 12:22 PM >>> To: [email protected] <[email protected]> >>> Subject: AI/LLM and Calcite contributions >>> >>> Hello, >>> a recent discussion [1] made me realize that, as a community, we haven't >>> made a precise statement if LLM-assisted contributions should be accepted, >>> and in case how they should be handled. >>> >>> Dmitry cites [2] in the discussion (on the strict side of the spectrum), >>> while I have seen more nuanced statements in the Apache foundation like [3] >>> (fine as long as you understand and can justify all you submitted). >>> >>> I'd like to hear your opinions, and ideally update the contributors >>> guideline accordingly, when we reach consensus. >>> >>> Best regards, >>> Alessandro >>> >>> 1: https://github.com/apache/calcite/pull/4692#discussion_r2639007178 >>> 2: https://wiki.gentoo.org/wiki/Project:Council/AI_policy >>> 3: >>> >>> https://datafusion.apache.org/contributor-guide/index.html#ai-assisted-contributions >>>
