Re: [DISCUSS] SQL support in Cassandra

Jordan West Fri, 07 Nov 2025 08:43:26 -0800

A bit late to this convo but I generally support the POV Joey and Chris
shared. I think SQL can be interesting as a separate layer.


Stepping back I think there is a larger conversation: the initial email
implicitly positions Cassandra’s succcess as trying to compete directly
with Aurora/CRDB/etc on ease of adoption. I’m not personally sure that’s
the best long term strategy (but I think even that warrants a separate
discussion so I’ll pause here). I know this is one component of a larger
vision but it might be prudent to align on where the community wants to
position the database before we talk about how to get there or we risk
building a hodge-podge that isn’t great at anything.

Jordan

On Wed, Nov 5, 2025 at 10:48 Patrick McFadin <[email protected]> wrote:

> I agree on splitting this up. I'll do that today.
>
> Patrick
>
> On Wed, Nov 5, 2025 at 10:44 AM Dinesh Joshi <[email protected]> wrote:
>
>> There are two distinct conversations in this thread.
>>
>> 1. What does the evolution of CQL Syntax look like?
>> 2. What is the path to bring SQL to Cassandra?
>>
>> I suggest we fork 2 discuss threads to have a focused discussion on each
>> topic.
>>
>> Thanks,
>>
>> Dinesh
>>
>> On Wed, Nov 5, 2025 at 10:29 AM David Capwell <[email protected]> wrote:
>>
>>> My personal stance is that new work should look at existing syntax and
>>> ask the question “why are we different”, if the answer is “I prefer this”
>>> or “I didn’t have the time”, I want to push back against this and argue for
>>> SQL / Postgres w/e possible.  If the answer is “correctness” or
>>> “performance” I am far more open to do things our own way.
>>>
>>> Given the above, I don’t like having a requirement we must be SQL /
>>> Postgres compliant, but I do think its a good guide post to keep in mind
>>> when we are doing something new.
>>>
>>> I worry that we already struggle to implement the current surface area
>>> of CQL correctly and in a way that scales safely.
>>>
>>>
>>> This has been a big issue for me over the past few years, when we
>>> implement features correctness / semantics have not historically been given
>>> the thought I feel that they deserve; we have so many weird behaviors that
>>> leak into user land (batch / CAS failures come to mind as they are
>>> constantly making me sad… why is the “short” type variable length? WHY DO
>>> WE HAVE MEANINGLESS EMPTYNESS!!!!); we have gotten much better over the
>>> years though… not all negative here =)
>>>
>>> SQL has been building its surface area for decades and trying to catch
>>> up is a significant effort and how to make things correct and performant
>>> becomes an issue.  In the latest spec there is now support for graph
>>> queries, so signing up to be compatible means we need to implement the below
>>>
>>> SELECT *
>>> FROM GRAPH_TABLE(my_graph
>>>     MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name =
>>> 'Alice')
>>>     WHERE a.name = 'Mary'
>>>     COLUMNS (a.name AS person_a, b.name AS person_b)
>>> );
>>>
>>> That above example is is just a simple example, it gets far more complex
>>> and would be harder for C* to support.
>>>
>>>
>>> I would be curious to see a gap analysis between CQL and SQL
>>> that include the differences in behaviors. I suspect that it will bring a
>>> few surprises and provide some more solid foundation to this discussion.
>>>
>>>
>>> I think this is a good starting point.  There are some nice things in
>>> SQL missing in C* that could be implemented without a ton of risk, and
>>> opening up the discussion around these areas makes sense to me.
>>>
>>> Off the top of my head, here are basic queries that work in SQL but not
>>> CQL, and there is very low levels of risk to support.
>>>
>>> SELECT 1 — simple query to test if the connection is still live
>>>
>>> SELECT func(42) FROM system.peers; — this has lead someone I know to
>>> have to implement functions that return constants specifically to work
>>> around this limitation…
>>>
>>>
>>>
>>> On Nov 5, 2025, at 9:15 AM, Jeff Jirsa <[email protected]> wrote:
>>>
>>> CQL just to demonstrate it’s possible
>>>
>>> Fat node style would indeed be faster but im mostly proving that its
>>> functional
>>>
>>> On Nov 5, 2025, at 8:55 AM, Joseph Lynch <[email protected]> wrote:
>>>
>>> 
>>> I very much like Jeff, Josh et al.'s proposals around the pluggable
>>> stateless API layer. Also I agree with Chris I would prefer a simpler API
>>> not a more complex one for our applications to couple to e.g. the Java
>>> stdlib. This also sets up a really nice path where the community members
>>> can build the layers that make sense first out-of-tree, and as a project we
>>> can choose the successful ones to bring in-tree. Whichever API those layers
>>> couple to would be a new semi-public interface though which has to be
>>> weighed.
>>>
>>> Jeff I am curious, in that prototype you are hacking are you interacting
>>> directly with the internode protocol and verb system or going through CQL?
>>> I imagine there could be some strengths to going straight to the internode?
>>>
>>> -Joey
>>>
>>> On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie <[email protected]>
>>> wrote:
>>>
>>>> Again from
>>>>
>>>> Right. I'm just zooming out a bit more and applying that same logical
>>>> pattern broadly to other API language domains, not just SQL. But yes - your
>>>> point definitely stands.
>>>>
>>>> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
>>>>
>>>> I’m grooving on what “Cloud Native Jeff” is saying here and I would
>>>> like to see where this could go. If we use a well established library like
>>>> Calcite, then there is no API to maintain. We might find parts of Cassandra
>>>> along the way we could alter to make it easier to integrate, but so far
>>>> that’s just a premature optimization.
>>>>
>>>> Suuuuper interested to see the TPC-C when you have it, Jeff.
>>>>
>>>> > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa <[email protected]> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On 2025/11/04 22:32:08 Josh McKenzie wrote:
>>>> >>
>>>> >> So I guess what I'm noodling on here is a superset of what Patrick
>>>> is w/a slight modification, where we double down on CQL as being the "low
>>>> level high performance" API for C*, and have SQL and other APIs built on
>>>> top of that.
>>>> >>
>>>> >
>>>> > Again from
>>>> https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
>>>> >
>>>> >> Or is it building a native SQL implementation stateless on top of a
>>>> backing ordered (ByteOrderedPartitioner), transactional (accord), key-value
>>>> cassandra cluster ? It’s an extra hop, but trying to adjust the existing
>>>> grammar / DDL to fit into a language it always mimicked but never
>>>> implemented faithfully feels like a bumpy road, where there are many
>>>> successful existence proofs for building it stateless a layer above.
>>>> >
>>>> > TiKV / TiDB, FoundationDB, etc, etc, etc.
>>>> >
>>>> > If you have a transactional, performant, ordered KV store, you can
>>>> built almost any high level database on top of it. You can expose even
>>>> lower layer primitives (like placement) to optimize for it.
>>>>
>>>>
>>>>
>>>>
>>>

Re: [DISCUSS] SQL support in Cassandra

Reply via email to