All of my text below largely extends your question of syntax in a few directions: - What is the user experience of trying to run different statements with this syntax? - How do transactions interact with other Cassandra constructs? - What are the execution semantics of these statements? which I do acknowledge is a moderate re-scoping of the question.
Also, please take my understanding of existing CQL and DDL constructs with an impractically large grain of salt. Undesireable Transactions ------------------------- I tried to match CQL docs up against a number of ways of writing statements which Accord wouldn't like, or users might not like the effect of running. I'm assuming it'd be good to think through how one would express the error message or guidance given to users? Or at least just making sure I understand correctly what is writable but not executable or desirable. === Likely Unexecutable All the cases here are predicated on the lack of automatic reconnaissance transaction support. 1. Dependant SELECTs CREATE TABLE users (name text primary key, home_state text); CREATE TABLE states (name text primary key, population int); BEGIN TRANSACTION; /*1*/ SELECT home_state FROM users WHERE name='blake' AS user; /*2*/ SELECT population FROM states WHERE name=user.home_state AS state; COMMIT TRANSACTION; The primary key for SELECT (2) depends on a value produced by SELECT (1), which results in not being able to produce the full read conflict set ahead of time for Accord. 2. Dependant UPDATEs CREATE TABLE users (name text primary key, home_state text); CREATE TABLE states (name text primary key, population int); BEGIN TRANSACTION; SELECT home_state FROM users WHERE name='blake' AS user; UPDATE states SET population += 1 WHERE name=user.home_state AS state; COMMIT TRANSACTION; The primary key for UPDATE depends on a value produced by SELECT, which results in not being able to produce the full write conflict set ahead of time for Accord. 3. UPDATE from secondary index (or SASI) CREATE TABLE users (id int primary key, name text, home_state text); CREATE INDEX users_by_name ON users (name); BEGIN TRANSACTION; UPDATE users SET miles_driven += 30 WHERE name='blake'; COMMIT TRANSACTION; This is just a rephasing of (2), but hiding the SELECT behind an implict query to the secondary index for the primary key. (But an UPDATE from a covering index would be okay!) 4. The presence of a materialized view which can implicitly add any/all of the above CREATE TABLE users ( name text primary key, miles_driven int, state text, ); CREATE MATERIALIZED VIEW users_by_home_state_by_miles AS SELECT * FROM users WHERE home_state IS NOT NULL PRIMARY KEY (home_state, miles_driven, name); BEGIN TRANSACTION; UPDATE users SET miles_driven += 30 WHERE name='blake'; COMMIT TRANSACTION; This is a rephrasing of (2), but the UPDATE is to the materialized view, and the SELECT is to get miles_driven out of the UPDATE on users. Some materialized views would be fine to transactionally update though. === Poor Performance 5. UPDATE with predicate on non-primary key CREATE TABLE users ( id int primary key, name text, miles_driven int ); BEGIN TRANSACTION; UPDATE users SET miles_driven += 30 WHERE name='blake'; COMMIT TRANSACTION; As now any transaction which touches the 'blake' row in `users` is going to have to wait behind a full table scan completing in Accord's transaction executor. Then any transaction which conflicts with one of those would also have to wait, and eventually snowball into a cascading stall in transaction processing. Support for these kinds of things could be useful to some users though? It might be wise to consider extending Accord with a table-level lock concept to avoid having to maintain conflict information on every key in the table individually. 6. Large SELECTs Are Actually Okay But Look Like They Shouldn't Be CREATE TABLE users (name text primary key, state text, miles_driven int); BEGIN TRANSACTION; SELECT sum(miles_driven) FROM users WHERE state='Ohio'; COMMIT TRANSACTION; As read-only transactions should be cheap, and I don't think the computation would be much more notably expensive than the non-transactional version of this. However, what maybe feels similar to a user: BEGIN TRANSACTION; SELECT sum(miles_driven) FROM users WHERE state='Ohio'; UPDATE miles_by_state SET total=sum(miles_driven) WHERE state='Ohio'; COMMIT TRANSACTION; does mean we're back in the bad idea category. === I Have Literally No Idea 7. Triggers What are the transactional guarantees of triggers? These are implemented in Java, so that'd just be outright banned by Accord? Unless triggers can have an API to spit out extra conflict ranges for the partition they live on? Sounds like an Accord Transactions v2 problem. :p Wide Rows and Projection ------------------------ Slightly reducing your example down, I'm curious if you see these two having exactly the same execution: BEGIN TRANSACTION; SELECT * FROM users WHERE name='blake' AS user; UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake'; COMMIT TRANSACTION; BEGIN TRANSACTION; UPDATE users SET miles_driven += 30 WHERE name='blake'; COMMIT TRANSACTION; When we assume that the users table is defined as something like: CREATE TABLE users ( name text PRIMARY KEY, account_active boolean, miles_driven int, last_update timestamp, -- [ many other columns ] 1GB_of_personal_thoughts text, ); Which hopefully outlines the case where a naive implementation of `SELECT * ... AS user` would read >1GB per row, only to discard most of it by the write phase. Do you plan on analyzing the CQL transaction in full to calculate the minimal set of columns needed from each table? Or does the execution expected to follow what is written? This also maybe turns into a question of if someone wishes to build a CQL validator, how much parsing and processing of CQL statements should they also need to do? Random Syntax Thoughts ---------------------- The consistent theme being "what if CQL read a little more SQL-y?" === RETURNING `UPDATE ... RETURNING (columns)` is a non-standard but commonly implemented SQL extension. Adding support to CQL for it feels like a nice way to remove the need for analysis to compute which columns need to be read from updated tables. BEGIN TRANSACTION; UPDATE users SET miles_driven += 30 WHERE name='blake' RETURNING (account_active) AS users; COMMIT TRANSACTION IF users.account_active; And for each DML type... - INSERT ... RETURNING returns inserted data (useful for defaulted or autoincrement columns). - UPDATE ... RETURNING returns the modified data. - DELETE ... RETURNING returns the now-deleted data. Instead of a RETURN statement, one could use a similar clause on the COMMIT: BEGIN TRANSACTION; UPDATE users SET miles_driven += 30 WHERE name='blake' RETURNING (name, account_active) AS user; COMMIT TRANSACTION IF users.account_active RETURNING (user.name); === WITH SQL's `WITH` construct almost lets one rewrite these statements in a different form BEGIN TRANSACTION; WITH user AS (SELECT * FROM users WHERE name='blake'), car AS (SELECT * FROM cars WHERE model='pinto'), updated_users AS (UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake'), updated_cars AS (UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto'), COMMIT TRANSACTION IF car.is_running; Which reads oddly to me, partly because subqueries are only a SQL thing, but I maybe do like the name being leading instead of trailing for ease of finding the name? BEGIN TRANSACTION; WITH user AS SELECT * FROM users WHERE name='blake'; WITH car AS SELECT * FROM cars WHERE model='pinto'; UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake'; UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto'; COMMIT TRANSACTION IF car.is_running; But I also see that WITH was already used in CQL to specify options, so that'd maybe be more confusing in context. Cheers, Alex