When a user hears PostgreSQL compatibility, the implicit assumption they have is a full bug-for-bug compatibility with Postgres. I don't think that's what you mean here, is it?
On Tue, Nov 4, 2025 at 11:38 AM Jeff Jirsa <[email protected]> wrote: > I started building a Postgres layer to convince myself it’s possible. It’s > got joins, interactive transactions, mvcc, pg wire protocol, query planner, > etc. it’s far enough along I can run tpc-c. > > The only cassandra change that was needed was a fix to accord for BOP > variable length tokens serialized in the journal. The rest just works if > you know how Postgres and Cassandra work. > > I’m running tpc-c to see how far from acceptable latency it is for a week > of toy work but I’m about 95% sure that anyone who knows how databases work > can implant a Postgres layer on cassandra for real as soon as accord > launches > > I don’t think the project needs to build this into cassandra. There are a > lot of reasons not to do that. > > > > > On Nov 4, 2025, at 11:18 AM, Josh McKenzie <[email protected]> wrote: > > > Good point Joey; I was rather focused on the ergonomics of implicit > constraint that come with CQL vs. SQL and the gap we'd have to bridge to > make a SQL-centric world have the same design language as CQL today. > > We can't afford to drop CQL at this point unless we had an overwhelmingly > bullet-proof CQL->SQL translation layer that didn't introduce new edge > cases nor performance degradation compared to CQL directly today. Users > would have to have the ability for existing CQL applications to Just Work > when migrated onto some new paradigm where the existing CQL native protocol > endpoints were deprecated. At that point we'd just be weighing the cost of > maintaining a translation layer between API semantics vs. a translation > layer between the native protocol and the storage engine we already have > today; lot of work to just be where we are today IMO. > > We've learned the hard way that when you remove functionality from the > database it hurts a lot of users in a lot of ways and we all discussed and > broadly had a consensus to try not to remove anything going forward on the > dev ML in the past year as I recall. Removing our core query language would > be... quite the opposite of what we discussed and agreed to. > > Now - SQL layer on top of the storage engine? If people want to work on > that I think it'd be great for our ecosystem. To Chris' point, I think > there's probably appetite from users' perspectives to have different APIs > to interact with data in the storage engine, be it gRPC, GraphQL, JSON, CQL > over REST, CQL, SQL, etc. Us having a layer that allowed us to reasonably > build in that functionality would be a net win. > > On Tue, Nov 4, 2025, at 12:36 PM, Chris Lohfink wrote: > > Just throwing my 2 cents in. I'm probably in the unpopular camp of wanting > to to move the other direction towards a grpc endpoint that is even more > restrictive than cql. This is coming from a standpoint of needing to clean > up after mistakes (application/modeling etc, not cassandra) than the > standpoint of trying to sell people on using the database. I would > prefer to see all the features and endpoints we provide work well without > breaking than make cool demos and feature bullet points. That said I know > in order for a database to be successful we need the cool feature sets as > well. CQL works for now and deprecating that would be an absolute > nightmare for people *already* using it (ie thrift migration was not fun > for anyone). I say create a new entrypoint or layer, mark it experimental > and allow operators to disable it but leave the existing CQL interface > alone. > > Chris > > On Tue, Nov 4, 2025 at 10:53 AM Isaac Reath <[email protected]> wrote: > > I share Joey's opinions on this. Many features that resemble SQL (e.g., > indexes, materialized views) come with caveats that stem from > their implementation details rather than the query language itself. If we > expose these same features through SQL as they are today, I think we'd risk > setting users up for disappointment, since they will come in with implicit > expectations about how a given SQL feature should work based on their > previous experience and more often than not we won't meet that expectation. > At least with CQL we set the expectation that this is a different database, > where familiar concepts might behave differently than you would expect. > > That said, in terms of a long term direction, I think having SQL support > is a good guiding light and implementing it as a stateless component as > Jeff suggests would help make this easier to realize. > > On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch <[email protected]> > wrote: > > Removing CQL is, in my opinion, completely off the table. When we > deprecated Thrift and gave CQL as the new query language, we imposed > significant pain on our existing functional Thrift applications to migrate > to it - I feel we should not hurt our users like that again. > > I worry that we already struggle to implement the current surface area of > CQL correctly and in a way that scales safely. For example, CQL allows us > to create arbitrarily large partitions, but large partitions and large > columns continue to be something our storage engine can't currently handle > well. CQL allows us to create secondary indices for improved filter support > but few can (or at least we struggle) to safely use them in production. We > still struggle with how page timeouts, hedges and retries work in an > idempotent and reliable way in our current protocol - although CQL at least > gives us a path to implementing those. > > I wonder if we should focus on being excellent at the basic write and read > operations we already support before adding more complexity at the API > layer. I am excited by the recent proposals around unbounded partitions, > byte ordered partitioner with safe data movement, ability to execute > analytics queries efficiently via a separate columnar representation etc > ... and *all* of those and more would likely be *required* to tackle SQL > in any meaningful way. > > The surface area of SQL is much much wider, requiring functional > implementation of all of that plus joins, interactive transactions and > more. The SQL protocol itself is also quite poor for reliable communication > and rarely has performant async clients with size based pagination, per > page timeouts, per page hedging, incremental progress over a streaming > async interface, pagination resumption, etc ... A lot of this difficulty > stems from the protocol often being tied to TCP connections and the > inherently unbounded complexity of the read interface. > > I guess I'm saying, I think we should prioritize succeeding at the API > scope we already have before adding more. Deferring to standard SQL syntax > or naming when we can just seems like a good idea (why reinvent concepts), > but I don't think the friction with CQL is because it's not SQL, I think > it's because users can't tell what works and what doesn't work. > > -Joey > > On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie <[email protected]> wrote: > > > +1 to Mick and Aleksey. I think the key for me was this: > > One is Cassandra’s wide-partition model with flexible clustering columns, > which supports very large, ordered partitions (e.g. time-series and > efficient range scans), rather than a strictly normalised, join-centric > model. These patterns don’t always map cleanly to SQL semantics, and CQL’s > query-driven, table-per-query modelling helps move users toward designs > that scale predictably. > > > We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here > <https://www.postgresql.org/docs/current/sql-explain.html>) for users to > be able to make sense of how their SQL queries translate into underlying > disk access patterns. Having a wide-open field of full SQL compliance they > then need to understand how to constrain to get horizontal scale out of it > would be *much more challenging* than the already somewhat "new" > cognitive muscle our users have to build to realize that horizontal scaling > of data access doesn't come free. > > I think that would give us a future state of "Use SQL when you need / want > a lot of expressivity, use CQL when you need to be constrained to language > primitives that keep your data access scalable". The part that gets me wary > here is how we've run into pain in the past trying to be both a database > that allows more query expressivity (ALLOW FILTERING, legacy 2i come to > mind) and a database that also wants horizontal scale. > > I'd love us to be able to have our cake and eat it too but I don't know if > that's possible. So at the very least I'd advocate for SQL + CQL going > forward, or SQL + a constrained "CQL-like" mode that gives the same > constraints CQL does today on modeling that guide people towards that very > partitionable path. > > On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote: > > I don’t mind us implementing some Postgres syntax support in some > capacity, but I do not like the idea of limiting what Cassandra is allowed > to do, or expose via CQL, to what is expressible by Postgres’s SQL. > > Many moons ago, before we started work on native protocol and CQL, I could > perhaps a bigger benefit to going Postgres route - for the client protocol > and the language. We could piggyback on existing client infrastructure and > SQL familiarity. But at this stage, when we have already made the effort to > develop decent drivers, and CQL is fleshed out, and C* is quite mature > overall, how much would we gain from this transition? > > I’m broadly with Mick here. And I support using Postgres’ SQL as > inspiration for implementing new CQL features wherever it makes sense - > it’s something we’ve been doing for a decade already. But I don’t believe > that deprecating CQL is the way to go at this point. > > > On 4 Nov 2025, at 06:38, Mick <[email protected]> wrote: > > > > > > > >> On 3 Nov 2025, at 20:32, Joel Shepherd <[email protected]> wrote: > >> > >> At the same time, my personal opinion is that if SQL compatibility is > pursued, then the end game should be to deprecate CQL. That will probably > take years, but at the limit I don't see a lot of benefit to supporting > both. > > > > > > > > We want SQL, but _why_ (in all its nuances) do we want SQL ? A lot is > obvious, but it is a very broad question. > > > > The adoption and standardisation benefits are obvious, but CQL has > strengths relative to SQL in Cassandra’s context. > > > > One is Cassandra’s wide-partition model with flexible clustering > columns, which supports very large, ordered partitions (e.g. time-series > and efficient range scans), rather than a strictly normalised, join-centric > model. These patterns don’t always map cleanly to SQL semantics, and CQL’s > query-driven, table-per-query modelling helps move users toward designs > that scale predictably. > > > > I can see CQL continuing as Cassandra’s high-throughput, query-driven > DSL, while we pursue SQL compatibility. I appreciate Dinesh’s ‘lanes’ > framing, e.g. eventually default to a SQL interface (with Accord) for the > broadest UX, while CQL remains a high-throughput path. > > > > Should we also be discussing storage-engine implications ? Cassandra’s > LSMT/SSTable design optimises write paths; while a SQL presents a logical > view without constraining physical layout; so data on disk stays optimised > for dominant access patterns. I can also see the need to discuss transport > vs query languages differences. > > > > Are we after both SQL's DML and DDL abilities ? Beyond accessibility > and exploration, SQL often comes with mature tooling for schema change > management. Cassandra supports online schema changes (e.g., ALTER TABLE), > but cross-table/primary-key changes remain constrained. A SQL interface > alone won’t ‘solve’ this: it’s about migration tooling and engine > capabilities; changing data models at-scale faces separate challenges. > > > > Especially outside of early-stage apps and ad-hoc exploration I find SQL > less interesting and its ergonomics less aligned with Cassandra’s runtime > performance model. That doesn't make me opposed to the endeavour of SQL > compatibility, it pushes me on the why question a bit more for alignment > clarity to our strengths. > > > > >
