A bit late to this convo but I generally support the POV Joey and Chris shared. I think SQL can be interesting as a separate layer.
Stepping back I think there is a larger conversation: the initial email implicitly positions Cassandra’s succcess as trying to compete directly with Aurora/CRDB/etc on ease of adoption. I’m not personally sure that’s the best long term strategy (but I think even that warrants a separate discussion so I’ll pause here). I know this is one component of a larger vision but it might be prudent to align on where the community wants to position the database before we talk about how to get there or we risk building a hodge-podge that isn’t great at anything. Jordan On Wed, Nov 5, 2025 at 10:48 Patrick McFadin <[email protected]> wrote: > I agree on splitting this up. I'll do that today. > > Patrick > > On Wed, Nov 5, 2025 at 10:44 AM Dinesh Joshi <[email protected]> wrote: > >> There are two distinct conversations in this thread. >> >> 1. What does the evolution of CQL Syntax look like? >> 2. What is the path to bring SQL to Cassandra? >> >> I suggest we fork 2 discuss threads to have a focused discussion on each >> topic. >> >> Thanks, >> >> Dinesh >> >> On Wed, Nov 5, 2025 at 10:29 AM David Capwell <[email protected]> wrote: >> >>> My personal stance is that new work should look at existing syntax and >>> ask the question “why are we different”, if the answer is “I prefer this” >>> or “I didn’t have the time”, I want to push back against this and argue for >>> SQL / Postgres w/e possible. If the answer is “correctness” or >>> “performance” I am far more open to do things our own way. >>> >>> Given the above, I don’t like having a requirement we must be SQL / >>> Postgres compliant, but I do think its a good guide post to keep in mind >>> when we are doing something new. >>> >>> I worry that we already struggle to implement the current surface area >>> of CQL correctly and in a way that scales safely. >>> >>> >>> This has been a big issue for me over the past few years, when we >>> implement features correctness / semantics have not historically been given >>> the thought I feel that they deserve; we have so many weird behaviors that >>> leak into user land (batch / CAS failures come to mind as they are >>> constantly making me sad… why is the “short” type variable length? WHY DO >>> WE HAVE MEANINGLESS EMPTYNESS!!!!); we have gotten much better over the >>> years though… not all negative here =) >>> >>> SQL has been building its surface area for decades and trying to catch >>> up is a significant effort and how to make things correct and performant >>> becomes an issue. In the latest spec there is now support for graph >>> queries, so signing up to be compatible means we need to implement the below >>> >>> SELECT * >>> FROM GRAPH_TABLE(my_graph >>> MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name = >>> 'Alice') >>> WHERE a.name = 'Mary' >>> COLUMNS (a.name AS person_a, b.name AS person_b) >>> ); >>> >>> That above example is is just a simple example, it gets far more complex >>> and would be harder for C* to support. >>> >>> >>> I would be curious to see a gap analysis between CQL and SQL >>> that include the differences in behaviors. I suspect that it will bring a >>> few surprises and provide some more solid foundation to this discussion. >>> >>> >>> I think this is a good starting point. There are some nice things in >>> SQL missing in C* that could be implemented without a ton of risk, and >>> opening up the discussion around these areas makes sense to me. >>> >>> Off the top of my head, here are basic queries that work in SQL but not >>> CQL, and there is very low levels of risk to support. >>> >>> SELECT 1 — simple query to test if the connection is still live >>> >>> SELECT func(42) FROM system.peers; — this has lead someone I know to >>> have to implement functions that return constants specifically to work >>> around this limitation… >>> >>> >>> >>> On Nov 5, 2025, at 9:15 AM, Jeff Jirsa <[email protected]> wrote: >>> >>> CQL just to demonstrate it’s possible >>> >>> Fat node style would indeed be faster but im mostly proving that its >>> functional >>> >>> On Nov 5, 2025, at 8:55 AM, Joseph Lynch <[email protected]> wrote: >>> >>> >>> I very much like Jeff, Josh et al.'s proposals around the pluggable >>> stateless API layer. Also I agree with Chris I would prefer a simpler API >>> not a more complex one for our applications to couple to e.g. the Java >>> stdlib. This also sets up a really nice path where the community members >>> can build the layers that make sense first out-of-tree, and as a project we >>> can choose the successful ones to bring in-tree. Whichever API those layers >>> couple to would be a new semi-public interface though which has to be >>> weighed. >>> >>> Jeff I am curious, in that prototype you are hacking are you interacting >>> directly with the internode protocol and verb system or going through CQL? >>> I imagine there could be some strengths to going straight to the internode? >>> >>> -Joey >>> >>> On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie <[email protected]> >>> wrote: >>> >>>> Again from >>>> >>>> Right. I'm just zooming out a bit more and applying that same logical >>>> pattern broadly to other API language domains, not just SQL. But yes - your >>>> point definitely stands. >>>> >>>> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote: >>>> >>>> I’m grooving on what “Cloud Native Jeff” is saying here and I would >>>> like to see where this could go. If we use a well established library like >>>> Calcite, then there is no API to maintain. We might find parts of Cassandra >>>> along the way we could alter to make it easier to integrate, but so far >>>> that’s just a premature optimization. >>>> >>>> Suuuuper interested to see the TPC-C when you have it, Jeff. >>>> >>>> > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa <[email protected]> wrote: >>>> > >>>> > >>>> > >>>> > On 2025/11/04 22:32:08 Josh McKenzie wrote: >>>> >> >>>> >> So I guess what I'm noodling on here is a superset of what Patrick >>>> is w/a slight modification, where we double down on CQL as being the "low >>>> level high performance" API for C*, and have SQL and other APIs built on >>>> top of that. >>>> >> >>>> > >>>> > Again from >>>> https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50 >>>> > >>>> >> Or is it building a native SQL implementation stateless on top of a >>>> backing ordered (ByteOrderedPartitioner), transactional (accord), key-value >>>> cassandra cluster ? It’s an extra hop, but trying to adjust the existing >>>> grammar / DDL to fit into a language it always mimicked but never >>>> implemented faithfully feels like a bumpy road, where there are many >>>> successful existence proofs for building it stateless a layer above. >>>> > >>>> > TiKV / TiDB, FoundationDB, etc, etc, etc. >>>> > >>>> > If you have a transactional, performant, ordered KV store, you can >>>> built almost any high level database on top of it. You can expose even >>>> lower layer primitives (like placement) to optimize for it. >>>> >>>> >>>> >>>> >>>
