On 11/3/2025 10:38 PM, Mick wrote:
On 3 Nov 2025, at 20:32, Joel Shepherd<[email protected]> wrote:
At the same time, my personal opinion is that if SQL compatibility is pursued,
then the end game should be to deprecate CQL. That will probably take years,
but at the limit I don't see a lot of benefit to supporting both.
We want SQL, but _why_ (in all its nuances) do we want SQL ? A lot is obvious,
but it is a very broad question.
The adoption and standardisation benefits are obvious, but CQL has strengths
relative to SQL in Cassandra’s context.
IMO this is the crux of the debate. If Patrick's hypothesis (from his
CoC talk IIRC) that there is a consolidation underway in the
database/storage world, including API consolidation, is correct, then
conforming to one of the "standard" data APIs could make more people
more comfortable with staking their future on Cassandra.
But if that requires building unsatisfying features in Cassandra (joins
with meh performance, "weird" transaction semantics for people coming
from an RDBMS background, etc.), or makes it harder to use existing
Cassandra functionality, then there is real risk of diluting Cassandra's
strengths and harming its reputation.
One is Cassandra’s wide-partition model with flexible clustering columns, which
supports very large, ordered partitions (e.g. time-series and efficient range
scans), rather than a strictly normalised, join-centric model. These patterns
don’t always map cleanly to SQL semantics, and CQL’s query-driven,
table-per-query modelling helps move users toward designs that scale
predictably.
I can see CQL continuing as Cassandra’s high-throughput, query-driven DSL,
while we pursue SQL compatibility. I appreciate Dinesh’s ‘lanes’ framing, e.g.
eventually default to a SQL interface (with Accord) for the broadest UX, while
CQL remains a high-throughput path.
This is where I could use some education. What are a couple examples of
things that make CQL better suited for high throughput than SQL? Some of
the key differences that I can see are:
* CQL is consensus-aware; SQL isn't.
* CQL is partition and cluster key aware; SQL has a single primary key
concept.
* CQL discourages cross- or multi-partition operations; SQL doesn't.
* CQL doesn't support relational joins, referential integrity, etc
(cross-partition and cross-table operations); SQL does.
Are there others that'd be good to think about?
Syntactically, the partition key vs primary key difference and
consensus-awareness seem like the hardest to deal with in SQL: I'm not
sure how to do it and stay conformant (not introduce Cassandra-specific
syntax). The other two, I think, could be addressed either by not
offering support (SQL w/o join syntax, ref integrity syntax, etc.) or by
offering constrained support (e.g. you can join but at least one side of
the join must be constrained to a single partition).
Would love to learn if there are other throughput-related nuances to CQL
that wouldn't translate easily to SQL.
<snip a bunch of other insightful questions, for brevity>
In the spirit of "respect what came before", I'm asking the next
question not to throw shade on CQL or its creators, but to explore
doubling-down on CQL.
If CQL didn't try as hard to look like SQL, could it be a better API for
Cassandra? For example, if the syntax required you to specify partition
key constraints explicitly, just like you have to be explicit with
"ALLOW FILTERING" today, could CQL become a more optimal language for
Cassandra?
If someone is going to the trouble of building a Cassandra front-end to
support a different query language to make Cassandra more appealing (I
think that's ultimately the goal), would it be better to deprioritize
SQL conformance and instead design a language specifically for
wide-partition, high throughput, distributed, eventually consistent
databases?
That doesn't make me opposed to the endeavour of SQL compatibility, it pushes
me on the why question a bit more for alignment clarity to our strengths.
Definitely agree that both approaches are worth consideration.
Thanks -- Joel.