Re: [DISCUSS] SQL support in Cassandra

Chris Lohfink Tue, 04 Nov 2025 09:38:14 -0800

Just throwing my 2 cents in. I'm probably in the unpopular camp of wanting
to to move the other direction towards a grpc endpoint that is even more
restrictive than cql. This is coming from a standpoint of needing to clean
up after mistakes (application/modeling etc, not cassandra) than the
standpoint of trying to sell people on using the database. I would
prefer to see all the features and endpoints we provide work well without
breaking than make cool demos and feature bullet points. That said I know
in order for a database to be successful we need the cool feature sets as
well.  CQL works for now and deprecating that would be an absolute
nightmare for people *already* using it (ie thrift migration was not fun
for anyone). I say create a new entrypoint or layer, mark it experimental
and allow operators to disable it but leave the existing CQL interface
alone.


Chris

On Tue, Nov 4, 2025 at 10:53 AM Isaac Reath <[email protected]> wrote:

> I share Joey's opinions on this. Many features that resemble SQL (e.g.,
> indexes, materialized views) come with caveats that stem from
> their implementation details rather than the query language itself. If we
> expose these same features through SQL as they are today, I think we'd risk
> setting users up for disappointment, since they will come in with implicit
> expectations about how a given SQL feature should work based on their
> previous experience and more often than not we won't meet that expectation.
> At least with CQL we set the expectation that this is a different database,
> where familiar concepts might behave differently than you would expect.
>
> That said, in terms of a long term direction, I think having SQL support
> is a good guiding light and implementing it as a stateless component as
> Jeff suggests would help make this easier to realize.
>
> On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch <[email protected]>
> wrote:
>
>> Removing CQL is, in my opinion, completely off the table. When we
>> deprecated Thrift and gave CQL as the new query language, we imposed
>> significant pain on our existing functional Thrift applications to migrate
>> to it - I feel we should not hurt our users like that again.
>>
>> I worry that we already struggle to implement the current surface area of
>> CQL correctly and in a way that scales safely. For example, CQL allows us
>> to create arbitrarily large partitions, but large partitions and large
>> columns continue to be something our storage engine can't currently handle
>> well. CQL allows us to create secondary indices for improved filter support
>> but few can (or at least we struggle) to safely use them in production. We
>> still struggle with how page timeouts, hedges and retries work in an
>> idempotent and reliable way in our current protocol - although CQL at least
>> gives us a path to implementing those.
>>
>> I wonder if we should focus on being excellent at the basic write and
>> read operations we already support before adding more complexity at the API
>> layer. I am excited by the recent proposals around unbounded partitions,
>> byte ordered partitioner with safe data movement, ability to execute
>> analytics queries efficiently via a separate columnar representation etc
>> ... and *all* of those and more would likely be *required* to tackle SQL
>> in any meaningful way.
>>
>> The surface area of SQL is much much wider, requiring functional
>> implementation of all of that plus joins, interactive transactions and
>> more. The SQL protocol itself is also quite poor for reliable communication
>> and rarely has performant async clients with size based pagination, per
>> page timeouts, per page hedging, incremental progress over a streaming
>> async interface, pagination resumption, etc ...  A lot of this difficulty
>> stems from the protocol often being tied to TCP connections and the
>> inherently unbounded complexity of the read interface.
>>
>> I guess I'm saying, I think we should prioritize succeeding at the API
>> scope we already have before adding more. Deferring to standard SQL syntax
>> or naming when we can just seems like a good idea (why reinvent concepts),
>> but I don't think the friction with CQL is because it's not SQL, I think
>> it's because users can't tell what works and what doesn't work.
>>
>> -Joey
>>
>> On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie <[email protected]>
>> wrote:
>>
>>> +1 to Mick and Aleksey. I think the key for me was this:
>>>
>>> One is Cassandra’s wide-partition model with flexible clustering
>>> columns, which supports very large, ordered partitions (e.g. time-series
>>> and efficient range scans), rather than a strictly normalised, join-centric
>>> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
>>> query-driven, table-per-query modelling helps move users toward designs
>>> that scale predictably.
>>>
>>>
>>> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here
>>> <https://www.postgresql.org/docs/current/sql-explain.html>) for users
>>> to be able to make sense of how their SQL queries translate into underlying
>>> disk access patterns. Having a wide-open field of full SQL compliance they
>>> then need to understand how to constrain to get horizontal scale out of it
>>> would be *much more challenging* than the already somewhat "new"
>>> cognitive muscle our users have to build to realize that horizontal scaling
>>> of data access doesn't come free.
>>>
>>> I think that would give us a future state of "Use SQL when you need /
>>> want a lot of expressivity, use CQL when you need to be constrained to
>>> language primitives that keep your data access scalable". The part that
>>> gets me wary here is how we've run into pain in the past trying to be both
>>> a database that allows more query expressivity (ALLOW FILTERING, legacy 2i
>>> come to mind) and a database that also wants horizontal scale.
>>>
>>> I'd love us to be able to have our cake and eat it too but I don't know
>>> if that's possible. So at the very least I'd advocate for SQL + CQL going
>>> forward, or SQL + a constrained "CQL-like" mode that gives the same
>>> constraints CQL does today on modeling that guide people towards that very
>>> partitionable path.
>>>
>>> On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:
>>>
>>> I don’t mind us implementing some Postgres syntax support in some
>>> capacity, but I do not like the idea of limiting what Cassandra is allowed
>>> to do, or expose via CQL, to what is expressible by Postgres’s SQL.
>>>
>>> Many moons ago, before we started work on native protocol and CQL, I
>>> could perhaps a bigger benefit to going Postgres route - for the client
>>> protocol and the language. We could piggyback on existing client
>>> infrastructure and SQL familiarity. But at this stage, when we have already
>>> made the effort to develop decent drivers, and CQL is fleshed out, and C*
>>> is quite mature overall, how much would we gain from this transition?
>>>
>>> I’m broadly with Mick here. And I support using Postgres’ SQL as
>>> inspiration for implementing new CQL features wherever it makes sense -
>>> it’s something we’ve been doing for a decade already. But I don’t believe
>>> that deprecating CQL is the way to go at this point.
>>>
>>> > On 4 Nov 2025, at 06:38, Mick <[email protected]> wrote:
>>> >
>>> >
>>> >
>>> >> On 3 Nov 2025, at 20:32, Joel Shepherd <[email protected]> wrote:
>>> >>
>>> >> At the same time, my personal opinion is that if SQL compatibility is
>>> pursued, then the end game should be to deprecate CQL. That will probably
>>> take years, but at the limit I don't see a lot of benefit to supporting
>>> both.
>>> >
>>> >
>>> >
>>> > We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is
>>> obvious, but it is a very broad question.
>>> >
>>> > The adoption and standardisation benefits are obvious, but CQL has
>>> strengths relative to SQL in Cassandra’s context.
>>> >
>>> > One is Cassandra’s wide-partition model with flexible clustering
>>> columns, which supports very large, ordered partitions (e.g. time-series
>>> and efficient range scans), rather than a strictly normalised, join-centric
>>> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
>>> query-driven, table-per-query modelling helps move users toward designs
>>> that scale predictably.
>>> >
>>> > I can see CQL continuing as Cassandra’s high-throughput, query-driven
>>> DSL, while we pursue SQL compatibility.  I appreciate Dinesh’s ‘lanes’
>>> framing, e.g. eventually default to a SQL interface (with Accord) for the
>>> broadest UX, while CQL remains a high-throughput path.
>>> >
>>> > Should we also be discussing storage-engine implications ?
>>> Cassandra’s LSMT/SSTable design optimises write paths; while a SQL presents
>>> a logical view without constraining physical layout; so data on disk stays
>>> optimised for dominant access patterns.  I can also see the need to discuss
>>> transport vs query languages differences.
>>> >
>>> > Are we after both SQL's DML and DDL abilities ?  Beyond accessibility
>>> and exploration, SQL often comes with mature tooling for schema change
>>> management. Cassandra supports online schema changes (e.g., ALTER TABLE),
>>> but cross-table/primary-key changes remain constrained. A SQL interface
>>> alone won’t ‘solve’ this: it’s about migration tooling and engine
>>> capabilities; changing data models at-scale faces separate challenges.
>>> >
>>> > Especially outside of early-stage apps and ad-hoc exploration I find
>>> SQL less interesting and its ergonomics less aligned with Cassandra’s
>>> runtime performance model.  That doesn't make me opposed to the endeavour
>>> of SQL compatibility, it pushes me on the why question a bit more for
>>> alignment clarity to our strengths.
>>>
>>>
>>>
>>>

Re: [DISCUSS] SQL support in Cassandra

Reply via email to