Re: [DISCUSS] SQL support in Cassandra

Joseph Lynch Tue, 04 Nov 2025 07:23:20 -0800

Removing CQL is, in my opinion, completely off the table. When we
deprecated Thrift and gave CQL as the new query language, we imposed
significant pain on our existing functional Thrift applications to migrate
to it - I feel we should not hurt our users like that again.


I worry that we already struggle to implement the current surface area of
CQL correctly and in a way that scales safely. For example, CQL allows us
to create arbitrarily large partitions, but large partitions and large
columns continue to be something our storage engine can't currently handle
well. CQL allows us to create secondary indices for improved filter support
but few can (or at least we struggle) to safely use them in production. We
still struggle with how page timeouts, hedges and retries work in an
idempotent and reliable way in our current protocol - although CQL at least
gives us a path to implementing those.

I wonder if we should focus on being excellent at the basic write and read
operations we already support before adding more complexity at the API
layer. I am excited by the recent proposals around unbounded partitions,
byte ordered partitioner with safe data movement, ability to execute
analytics queries efficiently via a separate columnar representation etc
... and *all* of those and more would likely be *required* to tackle SQL in
any meaningful way.

The surface area of SQL is much much wider, requiring functional
implementation of all of that plus joins, interactive transactions and
more. The SQL protocol itself is also quite poor for reliable communication
and rarely has performant async clients with size based pagination, per
page timeouts, per page hedging, incremental progress over a streaming
async interface, pagination resumption, etc ...  A lot of this difficulty
stems from the protocol often being tied to TCP connections and the
inherently unbounded complexity of the read interface.

I guess I'm saying, I think we should prioritize succeeding at the API
scope we already have before adding more. Deferring to standard SQL syntax
or naming when we can just seems like a good idea (why reinvent concepts),
but I don't think the friction with CQL is because it's not SQL, I think
it's because users can't tell what works and what doesn't work.

-Joey

On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie <[email protected]> wrote:

> +1 to Mick and Aleksey. I think the key for me was this:
>
> One is Cassandra’s wide-partition model with flexible clustering columns,
> which supports very large, ordered partitions (e.g. time-series and
> efficient range scans), rather than a strictly normalised, join-centric
> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
> query-driven, table-per-query modelling helps move users toward designs
> that scale predictably.
>
>
> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here
> <https://www.postgresql.org/docs/current/sql-explain.html>) for users to
> be able to make sense of how their SQL queries translate into underlying
> disk access patterns. Having a wide-open field of full SQL compliance they
> then need to understand how to constrain to get horizontal scale out of it
> would be *much more challenging* than the already somewhat "new"
> cognitive muscle our users have to build to realize that horizontal scaling
> of data access doesn't come free.
>
> I think that would give us a future state of "Use SQL when you need / want
> a lot of expressivity, use CQL when you need to be constrained to language
> primitives that keep your data access scalable". The part that gets me wary
> here is how we've run into pain in the past trying to be both a database
> that allows more query expressivity (ALLOW FILTERING, legacy 2i come to
> mind) and a database that also wants horizontal scale.
>
> I'd love us to be able to have our cake and eat it too but I don't know if
> that's possible. So at the very least I'd advocate for SQL + CQL going
> forward, or SQL + a constrained "CQL-like" mode that gives the same
> constraints CQL does today on modeling that guide people towards that very
> partitionable path.
>
> On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:
>
> I don’t mind us implementing some Postgres syntax support in some
> capacity, but I do not like the idea of limiting what Cassandra is allowed
> to do, or expose via CQL, to what is expressible by Postgres’s SQL.
>
> Many moons ago, before we started work on native protocol and CQL, I could
> perhaps a bigger benefit to going Postgres route - for the client protocol
> and the language. We could piggyback on existing client infrastructure and
> SQL familiarity. But at this stage, when we have already made the effort to
> develop decent drivers, and CQL is fleshed out, and C* is quite mature
> overall, how much would we gain from this transition?
>
> I’m broadly with Mick here. And I support using Postgres’ SQL as
> inspiration for implementing new CQL features wherever it makes sense -
> it’s something we’ve been doing for a decade already. But I don’t believe
> that deprecating CQL is the way to go at this point.
>
> > On 4 Nov 2025, at 06:38, Mick <[email protected]> wrote:
> >
> >
> >
> >> On 3 Nov 2025, at 20:32, Joel Shepherd <[email protected]> wrote:
> >>
> >> At the same time, my personal opinion is that if SQL compatibility is
> pursued, then the end game should be to deprecate CQL. That will probably
> take years, but at the limit I don't see a lot of benefit to supporting
> both.
> >
> >
> >
> > We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is
> obvious, but it is a very broad question.
> >
> > The adoption and standardisation benefits are obvious, but CQL has
> strengths relative to SQL in Cassandra’s context.
> >
> > One is Cassandra’s wide-partition model with flexible clustering
> columns, which supports very large, ordered partitions (e.g. time-series
> and efficient range scans), rather than a strictly normalised, join-centric
> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
> query-driven, table-per-query modelling helps move users toward designs
> that scale predictably.
> >
> > I can see CQL continuing as Cassandra’s high-throughput, query-driven
> DSL, while we pursue SQL compatibility.  I appreciate Dinesh’s ‘lanes’
> framing, e.g. eventually default to a SQL interface (with Accord) for the
> broadest UX, while CQL remains a high-throughput path.
> >
> > Should we also be discussing storage-engine implications ?  Cassandra’s
> LSMT/SSTable design optimises write paths; while a SQL presents a logical
> view without constraining physical layout; so data on disk stays optimised
> for dominant access patterns.  I can also see the need to discuss transport
> vs query languages differences.
> >
> > Are we after both SQL's DML and DDL abilities ?  Beyond accessibility
> and exploration, SQL often comes with mature tooling for schema change
> management. Cassandra supports online schema changes (e.g., ALTER TABLE),
> but cross-table/primary-key changes remain constrained. A SQL interface
> alone won’t ‘solve’ this: it’s about migration tooling and engine
> capabilities; changing data models at-scale faces separate challenges.
> >
> > Especially outside of early-stage apps and ad-hoc exploration I find SQL
> less interesting and its ergonomics less aligned with Cassandra’s runtime
> performance model.  That doesn't make me opposed to the endeavour of SQL
> compatibility, it pushes me on the why question a bit more for alignment
> clarity to our strengths.
>
>
>
>

Re: [DISCUSS] SQL support in Cassandra

Reply via email to