Re: [DISCUSS] SQL support in Cassandra

Jeff Jirsa Tue, 04 Nov 2025 06:57:21 -0800

I’m sorta confused. You can do single table design in sql if you don’t have a join centric workload. You still get to tell the database how to order your data on disk.

BOP gives you efficient range scans without having partition size problems that trap users when they cross into mega partition traps. I don’t think you have to say clustering data together is Cassandra’s key benefit, virtually every database is doing that, we just happen to do it with chunks of the users set of data instead did all of it.

Similarly suggesting the LSM / SStables somehow benefit write heavy cql but not sql is sorta weird since the explosion of rocksdb backed sql makes it clear you can use LSM + sstables for that too

On Nov 4, 2025, at 5:43 AM, Josh McKenzie <[email protected]> wrote:

+1 to Mick and Aleksey. I think the key for me was this:
One is Cassandra’s wide-partition model with flexible clustering columns, which supports very large, ordered partitions (e.g. time-series and efficient range scans), rather than a strictly normalised, join-centric model. These patterns don’t always map cleanly to SQL semantics, and CQL’s query-driven, table-per-query modelling helps move users toward designs that scale predictably.

We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here) for users to be able to make sense of how their SQL queries translate into underlying disk access patterns. Having a wide-open field of full SQL compliance they then need to understand how to constrain to get horizontal scale out of it would be much more challenging than the already somewhat "new" cognitive muscle our users have to build to realize that horizontal scaling of data access doesn't come free.

I think that would give us a future state of "Use SQL when you need / want a lot of expressivity, use CQL when you need to be constrained to language primitives that keep your data access scalable". The part that gets me wary here is how we've run into pain in the past trying to be both a database that allows more query expressivity (ALLOW FILTERING, legacy 2i come to mind) and a database that also wants horizontal scale.

I'd love us to be able to have our cake and eat it too but I don't know if that's possible. So at the very least I'd advocate for SQL + CQL going forward, or SQL + a constrained "CQL-like" mode that gives the same constraints CQL does today on modeling that guide people towards that very partitionable path.

On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:
I don’t mind us implementing some Postgres syntax support in some capacity, but I do not like the idea of limiting what Cassandra is allowed to do, or expose via CQL, to what is expressible by Postgres’s SQL.

Many moons ago, before we started work on native protocol and CQL, I could perhaps a bigger benefit to going Postgres route - for the client protocol and the language. We could piggyback on existing client infrastructure and SQL familiarity. But at this stage, when we have already made the effort to develop decent drivers, and CQL is fleshed out, and C* is quite mature overall, how much would we gain from this transition?

I’m broadly with Mick here. And I support using Postgres’ SQL as inspiration for implementing new CQL features wherever it makes sense - it’s something we’ve been doing for a decade already. But I don’t believe that deprecating CQL is the way to go at this point.

> On 4 Nov 2025, at 06:38, Mick <[email protected]> wrote:
>
>
>
>> On 3 Nov 2025, at 20:32, Joel Shepherd <[email protected]> wrote:
>>
>> At the same time, my personal opinion is that if SQL compatibility is pursued, then the end game should be to deprecate CQL. That will probably take years, but at the limit I don't see a lot of benefit to supporting both.
>
>
>
> We want SQL, but _why_ (in all its nuances) do we want SQL ? A lot is obvious, but it is a very broad question.
>
> The adoption and standardisation benefits are obvious, but CQL has strengths relative to SQL in Cassandra’s context.
>
> One is Cassandra’s wide-partition model with flexible clustering columns, which supports very large, ordered partitions (e.g. time-series and efficient range scans), rather than a strictly normalised, join-centric model. These patterns don’t always map cleanly to SQL semantics, and CQL’s query-driven, table-per-query modelling helps move users toward designs that scale predictably.
>
> I can see CQL continuing as Cassandra’s high-throughput, query-driven DSL, while we pursue SQL compatibility. I appreciate Dinesh’s ‘lanes’ framing, e.g. eventually default to a SQL interface (with Accord) for the broadest UX, while CQL remains a high-throughput path.
>
> Should we also be discussing storage-engine implications ? Cassandra’s LSMT/SSTable design optimises write paths; while a SQL presents a logical view without constraining physical layout; so data on disk stays optimised for dominant access patterns. I can also see the need to discuss transport vs query languages differences.
>
> Are we after both SQL's DML and DDL abilities ? Beyond accessibility and exploration, SQL often comes with mature tooling for schema change management. Cassandra supports online schema changes (e.g., ALTER TABLE), but cross-table/primary-key changes remain constrained. A SQL interface alone won’t ‘solve’ this: it’s about migration tooling and engine capabilities; changing data models at-scale faces separate challenges.
>
> Especially outside of early-stage apps and ad-hoc exploration I find SQL less interesting and its ergonomics less aligned with Cassandra’s runtime performance model. That doesn't make me opposed to the endeavour of SQL compatibility, it pushes me on the why question a bit more for alignment clarity to our strengths.

Re: [DISCUSS] SQL support in Cassandra

Reply via email to