Re: [DISCUSS] SQL support in Cassandra

Jeff Jirsa Sat, 01 Nov 2025 08:25:53 -0700

You can

But you can also build that same layer stateless right now and not worry about trying to contain the cql’isms which expose the clustering concepts

On Nov 1, 2025, at 8:04 AM, Patrick McFadin <[email protected]> wrote:

This opens up an entire line of discussion about the bigger goal of Cassandra becoming a fully cloud native DB but I’m here for it.

I’m not going to disagree with Jeff’s point. There is prior art that shows this is a way forward. What the DataStax team did splitting up parts of Cassandra to deploy independently has made multi-tenancy a proven a direction that works in production. Specifically what Jeff is proposing with a stateless service above a KV storage backend is exactly what TiDB does but with MySQL support.

Going back to my talk at CoC, this is what I firmly believe. Our moat is the storage engine and how it we continue to evolve it for more use cases but stick to the fundamentals of durability, distribution and scale. Accord, TCM and what’s now being proposed in CEP-57 are all critical elements to supporting more diverse workloads and holding the line on our core values.

With fundamental re-architecture like this, then why couldn’t we just use existing projects like Apache Calcite(https://calcite.apache.org/) and Substrait(https://substrait.io/)?

Patrick

On Oct 31, 2025, at 7:16 PM, C. Scott Andreas <[email protected]> wrote:

Jeff’s thoughts are mine exactly, and how I would imagine building this.

– Scott

—
Mobile

On Oct 31, 2025, at 9:53 PM, Jon Haddad <[email protected]> wrote:

I agree that new features should leverage SQL syntax. I can't think of a reason not to.

On Fri, Oct 31, 2025 at 5:00 PM Patrick McFadin <[email protected]> wrote:
I knew this would be a lot of information to try to convey. I swear it sounded amazing in my head :D

Let’s break up the Phase 1 and everything after it conversations.

The Phase 1 part was in response to some recent discussions on current CEPs. CEP-55(https://lists.apache.org/thread/4swcf1n4qm7ps6g4brv2wnrql8n72p61) and CEP-52(https://lists.apache.org/thread/8rcp808jb4y2jy2sttkhx0fv71qxnddf) In those threads, the syntax for the changes would have been unique to CQL. My suggestion was to just use the syntax for similar features in pgSQL. Jyothsna is finishing up CEP-52 and the syntax is pgSQL and so anyone using that new-to-Cassandra feature will find familiar syntax.

In those threads, it was correctly pointed out that we don’t have any agreed upon guidelines. My proposal with Phase 1 is just that. Any new feature that is proposed, we defer to the pgsql format whenever possible. And no, I'm not proposing we back port anything! (Not going there)

I don’t think that is a CEP, however it does need some formality. Maybe a VOTE thread and it’s just policy?

Jeff and Dinesh jumped into Phase 2, which is really the fun and interesting part. To be clear, I am not proposing we make any changes pre Cassandra 6 in this case. And this will be a CEP or two or three.

To directly answer the questions and my first shot at imaging an implementation in Phase 2, I think this is a matter of making QueryProcessor[1] pluggable. I can’t take credit for this idea. It’s been floated a few times, but in this case it might be the best place to start. And to answer Jeff, this isn’t transforming CQL. I do think this is a new implementation.

Then possibly you could run both CQL and SQL at the same time. It’s just a matter of what gets sent down to the storage engine. And then there is CEP-39 [2]

Jeff’s point about BOP is really interesting. And let’s not forget about CEP-57 that was just proposed. The point being we have a lot of future changes that, if everything is aligned, could come together is some interesting ways. We have to agree on the directionality as a baseline. It’s a strategy, not a plan.

1 - https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java
2 - https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
3 - https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-57%3A+Flat+keys+and+trie+interfaces

On Oct 31, 2025, at 3:22 PM, Ekaterina Dimitrova <[email protected]> wrote:

Hey Patrick,

Thanks for starting this discussion.
I am also curious to read the response to Dinesh’s questions.

Plus I have one to add myself (potentially more after I spend more time on this) - It is not clear to me what Phase 1 is. Do you suggest blocking 6.0 alpha to review all new not yet released syntax to try to align it with SQL? You plan to open a CEP and work on that? Or I misunderstood what you suggest?

Best regards,
Ekaterina

On Fri, 31 Oct 2025 at 18:15, Dinesh Joshi <[email protected]> wrote:
Thank you Patrick for starting this thread. Your talk was interesting. I want to better understand the nature of compatibility aspect of what you're proposing. Specifically, how do you envision the following scenarios to be supported in this new world –

1. Could an operator enable CQL and SQL simultaneously?
2. Does the user need to pick CQL or SQL at the time of Keyspace creation or can they switch between CQL and SQL on the fly?
3. Would the user be able to read and write to the same Keyspace using both CQL and SQL?
4. Do you envision the user being able to write using CQL and read using SQL?

Thanks,

Dinesh

On Fri, Oct 31, 2025 at 1:26 PM Patrick McFadin <[email protected]> wrote:
Over the last decade, CQL has served Cassandra users well by offering a familiar SQL-like interface for a distributed data model. However, as the broader database ecosystem converges on PostgreSQL-style SQL as the de facto standard for developers, it’s time to consider how Cassandra evolves to meet developers where they are without losing what makes it unique.

The great thing about SQL standards is that there are plenty to choose from. While the formal SQL:2023 specification (ISO/IEC 9075) exists, the industry has coalesced around the PostgreSQL dialect. Products such as AWS Aurora, AlloyDB, CockroachDB, YugabyteDB, and DuckDB, and many others offering “PostgreSQL-compatible” modes, have validated this direction. Developers are voting with their implementations. PostgreSQL SQL represents the lowest cognitive-load interface for application data, as repeatedly confirmed by developer surveys like Stack Overflow 2025[1].

What I’m proposing is that we begin to normalize the frontend to expand access to our extraordinary backend. The key principle here is ADD, not DELETE. CQL continues to work and be supported while we expand Cassandra’s capabilities through SQL compatibility, providing a familiar syntax and potentially supporting a larger ecosystem (JDBC, etc.).

Phase 1 (Before Cassandra 6) - Stop Digging
Freeze CQL at version 3 and align all new syntax or features (DML/DDL) to the PostgreSQL SQL dialect wherever possible. This approach was already demonstrated with CEP-52 and should become our norm.

Phase 2 (Years) - Create Parallel Paths
This is where we take our time and do things carefully, most likely over a series of years. Don’t touch the CQL path. Add an opt-in, feature flag path for SQL-only that conforms to the PostgreSQL SQL dialect. Begin our journey to feature compatibility here. At Community over Code this year, Alex Petrov and I sat in Aaron Ploetz’s kitchen (thanks for dinner, Aaron!) and brainstormed how this could work. The two critical aspects to manage are types and functionality. We may never be able to support everything, but given what this project has accomplished over the years, I wouldn’t bet on it. Being clear about the differences early on can serve as a roadmap for future contributors who want to be involved.

In discussion with Joel Shepherd on this topic, he sagely suggested some sub-steps inside this phase:

1 - Prioritize SQL that is compatible to get the incremental wins and early feedback from the user community.
2 - Tackle the non-compatible and triage for the long-term changes that would need to happen.
I took the time to do some rough mapping of syntax, features, and types:

Function and Feature Compatibility tables: https://docs.google.com/document/d/1K2-GKVM4Z_u1Hb1GtdrRyC9AdDN3RLwJ7LX_i_PqkOE/edit?usp=sharing

Typing differences: https://docs.google.com/spreadsheets/d/11tWkyCQ8WAFGnd5Va6iyltkp1wbKdAubxH9o_ZyJEtk/edit?usp=sharing

Phase 3 (Indefinite timeframe)– Become Default SQL
Once the SQL path achieves sufficient coverage and confidence, we can make it the default frontend, with CQL continuing to be supported indefinitely. The intent is not replacement but evolution toward broader accessibility.

This proposal is an invitation for discussion. Feedback from contributors, driver maintainers, and downstream users will guide the roadmap and priorities. The result will be the creation of CEPs as needed. If we get this right, Cassandra’s next decade will be defined by reach, compatibility, and continued excellence in scalability.

If you saw my talk in Minneapolis[2], you know I've been thinking about what we can accomplish in 10 years. The Phase 1 piece is near-term, but no timeframe for everything else. The best consensus I can hope for today is on directionality, and that starts with phase 1.

Patrick

1 - https://survey.stackoverflow.co/2025/technology#most-popular-technologies-database-prof
2 - https://youtu.be/rIh968dSlkQ

Re: [DISCUSS] SQL support in Cassandra

Reply via email to