[ https://issues.apache.org/jira/browse/CASSANDRA-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733311#comment-14733311 ]
Dennis Roppelt edited comment on CASSANDRA-10085 at 9/7/15 7:06 AM: -------------------------------------------------------------------- Hello Sylvain, Thank you for the quick and informative reply. I already anticipated that this would not be trivial to implement, and might go against some design principles created in C*. I also can see why this won't be implemented soon or at all. I will look into a different approach of modelling my data or see if another database is a better fit (obviously standard RDBMS will work for the model). It is a bit disappointing though, that this limitation was not mentioned in the documentation. Compared to other NoSQL-DBs, Cassandra usually has an outstandingly good documentation, with lots of examples and details. How would I suggest adding a hint about the limitiation with batch + conditional statements across tables? Creating a new issue and link to this one? was (Author: roppeldidu): Hello Sylvain, Thank you for the quick and informative reply. I already anticipated that this would not be trivial to implement, and might go against some design principles created in C*. I also can see why this won't be implemented soon or at all. I will look into a different approach of modelling my data or see if another database is a better fit (obviously standard RDBMS will work for the model). It is a bit disappointing though, that this limitation was not mentioned in the documentation. Compared to other NoSQL-DBs, Cassandra has an outstandingly good documentation. How would I suggest adding a hint about the limitiation with batch + conditional statements across tables? Creating a new issue and link to this one? > Allow BATCH with conditions to span multiple tables with same partition key > --------------------------------------------------------------------------- > > Key: CASSANDRA-10085 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10085 > Project: Cassandra > Issue Type: Improvement > Reporter: Dennis Roppelt > Labels: batch, conditional-statement, partitioning > > h4. Use case: > I want to use batch to keep data between tables synchronized with help of > conditionals. In particular, I want to have a ONE-to-ONE relationship in my > data model. I dont want to have an unintended upsert while trying to maintain > a consistent one-to-one relationship (write requests from other clients that > try to create a different relationship with one of the keys I inteded to > use), which is why I would like the IF NOT EXISTS when inserting data into > multiple tables in a batch. > But when trying to insert data in a batch-block with an IF NOT EXISTS, I get > the following response (reproduceable example further below): > {code} > cqlsh:testkeyspace> > BEGIN BATCH > INSERT INTO key1_to_key2 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS; > INSERT INTO key2_to_key1 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS; > APPLY BATCH; > Bad Request: Batch with conditions cannot span multiple tables{code} > The tables have a primary key on (partKey, key1) and (partKey, key2). Which > means that partKey in both cases is their partition key. Which in the example > also is the same value. > h4. Why I want to use a BATCH-statement that way: > In traditional databases, to design a one-to-one relationship, I would create > one table with two keys, both with a unique constraint: > {code:sql} > CREATE TABLE `myOneToOneTable` ( > `ID` int NOT NULL, > `KEY_ONE` int NOT NULL, > `KEY_TWO` int NOT NULL, > // some other fields > PRIMARY KEY (`ID`), > UNIQUE KEY `KEY_ONE`, > UNIQUE KEY `KEY_TWO` > ); > {code} > (simplified example taken from an [one-to-one example for > mysql|http://www.mkyong.com/mysql/how-to-define-one-to-one-relationship-in-mysql/]) > As far as I know, unique constraints are not supported in Cassandra, so > another way around it would be to maintain the relationshipt between two > tables, each sharing the same partition key. For which I would like Cassandra > to make sure that when I insert data, either both are inserted at the same > time or none (to prevent other writing client to insert a different > one-to-one relationship than me). > [According to the documentation, this is what BATCH-Statements are used > for|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html] > {quote} > Instead, using batches to synchronize data to tables is a legitimate > operation. > {quote} > I expected it to be able to do it, but as seen above, I cannot enforce INSERT > IF NOT EXISTS over multiple tables. > I do understand the background as to *why* there is a limitation coded that > way, so that not too many nodes have to take part in processing of the batch. > However, if the operations use > # primary keys in their statements for each table > # all primary keys contain the same value, thus resulting in the same hash > for partioning > # tables belong to the same keyspace > the batch statement should be able to be limited to the same amount of nodes > as if the batch contained INSERT IF NOT EXISTs for only one table. > h4. Steps to reproduce desired behaviour: > I used the [cassandra ova on virtual > box|http://www.planetcassandra.org/install-cassandra-ova-on-virtualbox/], but > also a Windows Setup with 2.2.0 > {code} > Connected to Cluster on a Stick at localhost:9160. > > [cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0] > > Use HELP for help. > > cqlsh> CREATE KEYSPACE conditionalBatch > > WITH REPLICATION = {'class':'SimpleStrategy','replication_factor':1}; > cqlsh> use conditionalbatch ; > > cqlsh:conditionalbatch> CREATE TABLE key1_to_key2 ( > > partKey int, > key1 int, > key2 int, > PRIMARY KEY (partKey, key1) > ); > cqlsh:conditionalbatch> CREATE TABLE key2_to_key1 ( partKey int, key2 int, > key1 > int, PRIMARY KEY (partKey, key2) > ); > cqlsh:conditionalbatch> BEGIN BATCH > > INSERT INTO key1_to_key2 (partKey, key1, key2) VALUES (1,2,3) IF NOT EXISTS; > > INSERT INTO key2_to_key1 (partKey, Key1, key2) VALUES (1,2,3) IF NOT EXISTS; > > APPLY BATCH; > {code} > h4. What should happen: > Either succeding, or: > {code} > applied | partkey | key1 | key2| > > -----------+---------+------+------ > > false | 1 | 2 | 3 > > false | 1 | 2 | 3 > {code} > h4. What happens instead: > {code} > Bad Request: Batch with conditions cannot span multiple tables > {code} > In my opinion, this is a very crucial feature wich enables the users to > maintain unique constraints with a relatively small amount of work. There > certainly are other ways to maintain unique constrains, but I have not found > a trivial way within cassandra that lets me that easily. But maybe I have > overlooked something though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)