Wide Row with some indexes in CQL3
Hi, In my new project, I want to create a wide row CF with indexes in CQL3. In my project, users can store multipurpose data and backed to Cassandra. For example, my Profile CF has some user-defined properties as wide row. And users can add indexes on columns which they want. Of cource, we can create such a CF easily on Thrift as below. = [default] create column family profile_thrift with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type'; [default] update column family profile_thrift with column_metadata = [{ column_name : gender, validation_class : UTF8Type, index_type : 0, index_name : idx_gender }]; [default] get profile_thrift where gender='m'; --- RowKey: user002 = (name=age, value=33, timestamp=1382524511829000) = (name=gender, value=m, timestamp=138252451927) = (name=name, value=bob, timestamp=1382524503605000) --- RowKey: user003 = (name=gender, value=m, timestamp=1382524887595000) = (name=hobby, value=driving, timestamp=1382524567386000) = (name=name, value=charlie, timestamp=138252454571) 2 Rows Returned. Elapsed time: 39 msec(s). = But now, Thrift looks likely to obsoleted sooner or later and I hesitate to adopt it to new project now. (cf. http://www.mail-archive.com/dev@cassandra.apache.org/msg06560.html ) Although I don't have any idea how create such a table in CQL3. I can create wide row tables with index in CQL3 on *ALL* columns of profile. I think it's too much. So I want to index on *specified* column. = CREATE TABLE profile_CQL3 ( id text, column text, value text, PRIMARY KEY (id, column) ); CREATE INDEX idx_profile_val ON profile_CQL3 (value); INSERT INTO profile_CQL3 (id,column,value) VALUES('user001', name','alice'); INSERT INTO profile_CQL3 (id,column,value) VALUES('user001','age','18'); INSERT INTO profile_CQL3 (id,column,value) VALUES('user001','gender','f'); INSERT INTO profile_CQL3 (id,column,value) VALUES('user002', name','bob'); INSERT INTO profile_CQL3 (id,column,value) VALUES('user002','age','33'); INSERT INTO profile_CQL3 (id,column,value) VALUES('user002','gender','m'); INSERT INTO profile_CQL3 (id,column,value) VALUES('user003', name','charlie'); INSERT INTO profile_CQL3 (id,column,value) VALUES('user003','hobby','driving') INSERT INTO profile_CQL3 (id,column,value) VALUES('user003','gender','m'); select * from profile_CQL3 where value='m' and column='gender'; = So, do you know some good way to do this on CQL3? or should I adopt Thrift in this project? (User-profile is just a example. we would like to provide versatile datastore which have property like this (i.e like thrift-api).) Thanks.
Data Model for time series with multiple interval / sub-sample
Hi, I'm very new with and trying out cassandra. I have couple of question regarding the design of the database. We have an API to store time series sensor data in millisecond precision. user can do CRUD operation through the Restful API. When user retrieve data, by default they can specify `start_date` and `end_date` which is an epoch time stamp. every GET request are paginated with maximum 1000 item per-page. Also user can specify interval of data in one of (604800 (1 week), 86400 (1 day), 3600 (1 hr), 1800 (30 min), 600 (10 min), 300 ( 5 min), and 60 (1 min)) My initial design is 1. a table for stroing row data 2. table for each sensor interval 3. sensor as row 4. timestamp as column but the current problem is about deletion of data. let say that i have store 120 data point, 1 point every second for 2 minute. the interval is populated with the last data point received on that interval. this mean: 120 column on raw table 2 column on of the '1 min' interval table 1 column on other interval table. let say that I delete one data point, this mean that I have to get all interval data where the point belongs to and also get raw data around the deleted point to either update or remove the data on the interval table. also we support delete data with time range, then this will be more complex operation probably. Is this design correct or maybe there is a better design for modeling the data ? Ahmy Yulrizka http://ahmy.yulrizka.com @yulrizka
Re: [RELEASE] Apache Cassandra 1.2.11 released
Yes it's included under commit 639c01a3504b -- Cyril SCETBON On 23 Oct 2013, at 19:33, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Question - is https://issues.apache.org/jira/browse/CASSANDRA-6102 in 1.2.11 or not? CHANGES.txt says it's not, JIRA says it is. /Janne (temporarily unable to check out the git repo) On Oct 22, 2013, at 13:48 , Sylvain Lebresne sylv...@datastax.com wrote: The Cassandra team is pleased to announce the release of Apache Cassandra version 1.2.11. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is a maintenance/bug fix release[1] on the 1.2 series. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. Enjoy! [1]: http://goo.gl/xjiN74 (CHANGES.txt) [2]: http://goo.gl/r5pVU2 (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Re: MemtablePostFlusher pending
Hello aaron, I hope you had a nice flight. Any information on how you are using cassandra, does the zero columns no row delete idea sound like something you are doing ? I do not know what I could do, but we use an old versions of phpcassa (0.8.a.2) that are not explictly compatible with cassandra 2.0, but work fine for us. Another thing that could help, when i created the keyspace i do : CREATE KEYSPACE ks01 WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1b': 3 }; USE ks01 ; DROP KEYSPACE ks01 ; CREATE KEYSPACE ks01 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; In jira i saw that creating keyspace after drop has already cause a bug in the past (CASSANDRA-4219) Thanks; Kais 2013/10/23 Aaron Morton aa...@thelastpickle.com On a plane and cannot check jira but… ERROR [FlushWriter:216] 2013-10-07 07:11:46,538 CassandraDaemon.java (line 186) Exception in thread Thread[FlushWriter:216,5,main] java.lang.AssertionError at org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:198) Happened because we tried to write a row to disk that had zero columns and was not a row level tombstone. ERROR [ValidationExecutor:2] 2013-10-23 08:39:27,558 CassandraDaemon.java (line 185) Exception in thread Thread[ValidationExecutor:2,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.PrecompactedRow.update(PrecompactedRow.java:171) at org.apache.cassandra.repair.Validator.rowHash(Validator.java:198) at org.apache.cassandra.repair.Validator.add(Validator.java:151) I *think* this is happening for similar reasons. (notes to self below)… public PrecompactedRow(CompactionController controller, ListSSTableIdentityIterator rows) { this(rows.get(0).getKey(), removeDeletedAndOldShards(rows.get(0).getKey(), controller, merge(rows, controller))); } results in call to this on CFS public static ColumnFamily removeDeletedCF(ColumnFamily cf, int gcBefore) { cf.maybeResetDeletionTimes(gcBefore); return cf.getColumnCount() == 0 !cf.isMarkedForDelete() ? null : cf; } If the CF has zero columns and is not marked for delete the CF will be null, and the PreCompacedRow will be created with a non cf. This is the source of the assertion. Any information on how you are using cassandra, does the zero columns no row delete idea sound like something you are doing ? This may already be fixed. Will take a look later when on the ground. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 23/10/2013, at 9:50 PM, Kais Ahmed k...@neteck-fr.com wrote: Thanks robert, For info if it helps to fix the bug i'm starting the downgrade, i restart all the node and do a repair and there are a lot of error like this : EERROR [ValidationExecutor:2] 2013-10-23 08:39:27,558 Validator.java (line 242) Failed creating a merkle tree for [repair #9f9b7fc0-3bbe-11e3-a220-b18f7c69b044 on ks01/messages, (8746393670077301406,8763948586274310360]], /172.31.38.135 (see log for details) ERROR [ValidationExecutor:2] 2013-10-23 08:39:27,558 CassandraDaemon.java (line 185) Exception in thread Thread[ValidationExecutor:2,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.PrecompactedRow.update(PrecompactedRow.java:171) at org.apache.cassandra.repair.Validator.rowHash(Validator.java:198) at org.apache.cassandra.repair.Validator.add(Validator.java:151) at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:798) at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:60) at org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:395) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) And the repair stop after this error : ERROR [FlushWriter:9] 2013-10-23 08:39:32,979 CassandraDaemon.java (line 185) Exception in thread Thread[FlushWriter:9,5,main] java.lang.AssertionError at org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:198) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:186) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:358) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:317) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at
Re: Compaction issues
On 23 October 2013 21:25, Aaron Morton aa...@thelastpickle.com wrote: Is there ever a time when the pending count is non zero but nodetool compactionstats does not show any running tasks ? No - but there are times when the number of running compaction processes is less than concurrent_compactors, which is confusing. If compaction cannot keep up you may be generating data faster than LCS can compact it. What sort of disks do you have? I'm beginning to suspect this is probably the case. These machines only have one (conventional) disk, although we never had a problem with TieredCompactionStrategy. What is the min_sstable_size ? The old default was 5, the new one is 130. The higher value will result in less IO. It's 256MB. -- Russ Garrett r...@garrett.co.uk
Problems using secondary index with IN keyword
Hi, I have a table that (in simplified version) looks like this: CREATE TABLE mytable ( a varchar, b varchar, c varchar d timstamp, e varchar, PRIMARY KEY (a, b, c, d) ); CREATE INDEX mytable_c_idx ON mytable ( c ); After populating I execute: SELECT * FROM mytable WHERE c='myvalue'; which works fine. However, using: SELECT * FROM mytable WHERE c IN ('myvalue'); gives me: Bad Request: PRIMARY KEY part c cannot be restricted (preceding part b is either not restricted or by a non-EQ relation) Can anybody explain this? My aim is to query for more than one value in the c column. Is this supported? Thanks, Petter
Re: Problems using secondary index with IN keyword
Petter, On 24.10.2013, at 14:38, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com wrote: Hi, I have a table that (in simplified version) looks like this: CREATE TABLE mytable ( a varchar, b varchar, c varchar d timstamp, e varchar, PRIMARY KEY (a, b, c, d) ); CREATE INDEX mytable_c_idx ON mytable ( c ); After populating I execute: SELECT * FROM mytable WHERE c='myvalue'; which works fine. However, using: SELECT * FROM mytable WHERE c IN ('myvalue'); gives me: Bad Request: PRIMARY KEY part c cannot be restricted (preceding part b is either not restricted or by a non-EQ relation) Can anybody explain this? You cannot use a part in a where clause unless you specify the preceeding parts also. Think of it this way: To resolve yur restriction, C* would have to do a full scan over all rows to find those rows that have a 'myvalue' C-part. Jan My aim is to query for more than one value in the c column. Is this supported? Thanks, Petter
Re: Problems using secondary index with IN keyword
You cannot use a part in a where clause unless you specify the preceeding parts also. But the statement SELECT * FROM mytable WHERE c='myvalue'; works? What are secondary indexes for then if you can't use them in this way? Forgot to mention that I am on Cassandra 2.0.1 /Petter 2013/10/24 Jan Algermissen jan.algermis...@nordsc.com Petter, On 24.10.2013, at 14:38, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com wrote: Hi, I have a table that (in simplified version) looks like this: CREATE TABLE mytable ( a varchar, b varchar, c varchar d timstamp, e varchar, PRIMARY KEY (a, b, c, d) ); CREATE INDEX mytable_c_idx ON mytable ( c ); After populating I execute: SELECT * FROM mytable WHERE c='myvalue'; which works fine. However, using: SELECT * FROM mytable WHERE c IN ('myvalue'); gives me: Bad Request: PRIMARY KEY part c cannot be restricted (preceding part b is either not restricted or by a non-EQ relation) Can anybody explain this? You cannot use a part in a where clause unless you specify the preceeding parts also. Think of it this way: To resolve yur restriction, C* would have to do a full scan over all rows to find those rows that have a 'myvalue' C-part. Jan My aim is to query for more than one value in the c column. Is this supported? Thanks, Petter
Re: read latencies?
The key is this line: Read 827 live and 6948 tombstoned cells That means you either have a lot of deleted or TTLed columns in that row. One option to help with that is to set a lower gc_grace for the table and repair more frequently; this will help tombstones get purged more quickly. Another option is to adjust your data model so that you periodically switch to a new row. On Wed, Oct 23, 2013 at 7:20 PM, Matt Mankins mmank...@fastcompany.comwrote: Hi. I have a table with about 300k rows in it, and am doing a query that returns about 800 results. select * from fc.co WHERE thread_key = 'fastcompany:3000619'; The read latencies seem really high (upwards of 500ms)? Or is this expected? Is this bad schema, or…? What's the best way to trace the bottleneck, besides this tracing query: http://pastebin.com/sherFpgY Or, how would you interpret that? I'm not sure that row caches are being used, despite them being turned on in the cassandra.yaml file. I'm using a 3 node cluster on amazon, using datastax community edition, cassandra 2.0.1, in the same EC2 availability zone. Many thanks, @Mankins -- Tyler Hobbs DataStax http://datastax.com/
Re: Problems using secondary index with IN keyword
On 24.10.2013, at 15:13, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com wrote: You cannot use a part in a where clause unless you specify the preceeding parts also. But the statement SELECT * FROM mytable WHERE c='myvalue'; works? What are secondary indexes for then if you can't use them in this way? What you have in your example is parts of a primary key. Secondary are defined in a different way (I never use them because they sort of goes against the whole point of using C* IMHO so I don't know right now how to do it. check the docs) Jan Forgot to mention that I am on Cassandra 2.0.1 /Petter 2013/10/24 Jan Algermissen jan.algermis...@nordsc.com Petter, On 24.10.2013, at 14:38, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com wrote: Hi, I have a table that (in simplified version) looks like this: CREATE TABLE mytable ( a varchar, b varchar, c varchar d timstamp, e varchar, PRIMARY KEY (a, b, c, d) ); CREATE INDEX mytable_c_idx ON mytable ( c ); After populating I execute: SELECT * FROM mytable WHERE c='myvalue'; which works fine. However, using: SELECT * FROM mytable WHERE c IN ('myvalue'); gives me: Bad Request: PRIMARY KEY part c cannot be restricted (preceding part b is either not restricted or by a non-EQ relation) Can anybody explain this? You cannot use a part in a where clause unless you specify the preceeding parts also. Think of it this way: To resolve yur restriction, C* would have to do a full scan over all rows to find those rows that have a 'myvalue' C-part. Jan My aim is to query for more than one value in the c column. Is this supported? Thanks, Petter
Re: Problems using secondary index with IN keyword
I have defined the secondary index on a field that is part of the primary key. This should be ok. Maybe you missed the CREATE INDEX bit in my original post. I might end up not using secondary indexes but since the feature is there and I need the functionality I would like to know its limitations and if the behaviour I am experiencing is a problem with my design, a problem with the data or a problem with Cassandra. /Petter Den torsdagen den 24:e oktober 2013 skrev Jan Algermissen: On 24.10.2013, at 15:13, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com javascript:; wrote: You cannot use a part in a where clause unless you specify the preceeding parts also. But the statement SELECT * FROM mytable WHERE c='myvalue'; works? What are secondary indexes for then if you can't use them in this way? What you have in your example is parts of a primary key. Secondary are defined in a different way (I never use them because they sort of goes against the whole point of using C* IMHO so I don't know right now how to do it. check the docs) Jan Forgot to mention that I am on Cassandra 2.0.1 /Petter 2013/10/24 Jan Algermissen jan.algermis...@nordsc.com javascript:; Petter, On 24.10.2013, at 14:38, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com javascript:; wrote: Hi, I have a table that (in simplified version) looks like this: CREATE TABLE mytable ( a varchar, b varchar, c varchar d timstamp, e varchar, PRIMARY KEY (a, b, c, d) ); CREATE INDEX mytable_c_idx ON mytable ( c ); After populating I execute: SELECT * FROM mytable WHERE c='myvalue'; which works fine. However, using: SELECT * FROM mytable WHERE c IN ('myvalue'); gives me: Bad Request: PRIMARY KEY part c cannot be restricted (preceding part b is either not restricted or by a non-EQ relation) Can anybody explain this? You cannot use a part in a where clause unless you specify the preceeding parts also. Think of it this way: To resolve yur restriction, C* would have to do a full scan over all rows to find those rows that have a 'myvalue' C-part. Jan My aim is to query for more than one value in the c column. Is this supported? Thanks, Petter
what does nodetool compact command do for leveled compactions?
Hi, I have a column family created with strategy of leveled compaction. If I execute nodetool compact command, will the columnfamily be compacted using size tiered compaction strategy? If yes, after the major size tiered compaction finishes will it at any point trigger leveled compaction back on the column family? Thanks, Rashmi
Re: what does nodetool compact command do for leveled compactions?
On Thu, Oct 24, 2013 at 3:13 PM, rash aroskar rashmi.aros...@gmail.comwrote: I have a column family created with strategy of leveled compaction. If I execute nodetool compact command, will the columnfamily be compacted using size tiered compaction strategy? No. If yes, after the major size tiered compaction finishes will it at any point trigger leveled compaction back on the column family? https://issues.apache.org/jira/browse/CASSANDRA-6092 and especially : https://issues.apache.org/jira/browse/CASSANDRA-6092?focusedCommentId=13793012page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13793012 tl;dr : if you have more than a single SSTable, it will do something. There is disagreement whether it should do something if you only have a single SSTable. My view, for the record, is yes. =Rob
Re: Problems using secondary index with IN keyword
On 24.10.2013, at 22:00, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com wrote: I have defined the secondary index on a field that is part of the primary key. This should be ok. Maybe you missed the CREATE INDEX bit in my original post. Yes I did - despite actually looking for it :-) Sorry. Then I am curious, too what the intended behaviour is. Jan I might end up not using secondary indexes but since the feature is there and I need the functionality I would like to know its limitations and if the behaviour I am experiencing is a problem with my design, a problem with the data or a problem with Cassandra. /Petter Den torsdagen den 24:e oktober 2013 skrev Jan Algermissen: On 24.10.2013, at 15:13, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com wrote: You cannot use a part in a where clause unless you specify the preceeding parts also. But the statement SELECT * FROM mytable WHERE c='myvalue'; works? What are secondary indexes for then if you can't use them in this way? What you have in your example is parts of a primary key. Secondary are defined in a different way (I never use them because they sort of goes against the whole point of using C* IMHO so I don't know right now how to do it. check the docs) Jan Forgot to mention that I am on Cassandra 2.0.1 /Petter 2013/10/24 Jan Algermissen jan.algermis...@nordsc.com Petter, On 24.10.2013, at 14:38, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com wrote: Hi, I have a table that (in simplified version) looks like this: CREATE TABLE mytable ( a varchar, b varchar, c varchar d timstamp, e varchar, PRIMARY KEY (a, b, c, d) ); CREATE INDEX mytable_c_idx ON mytable ( c ); After populating I execute: SELECT * FROM mytable WHERE c='myvalue'; which works fine. However, using: SELECT * FROM mytable WHERE c IN ('myvalue'); gives me: Bad Request: PRIMARY KEY part c cannot be restricted (preceding part b is either not restricted or by a non-EQ relation) Can anybody explain this? You cannot use a part in a where clause unless you specify the preceeding parts also. Think of it this way: To resolve yur restriction, C* would have to do a full scan over all rows to find those rows that have a 'myvalue' C-part. Jan My aim is to query for more than one value in the c column. Is this supported? Thanks, Petter
Re: Problems using secondary index with IN keyword
Before CASSANDRA-1337 [1] (Cassandra 2.1 line) secondary indexes + vnodes can have very slow performance due to the potentially large number of splits/nodes sequentially scanned. =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-1337
Re: what does nodetool compact command do for leveled compactions?
Thanks for the reply. I have one more question. If multiple columns with identical names but with different timestamps are bulk loaded (with sstableloader) into a CF, and we had LCS running in the background, would a slice predicate query retrieve multiple columns with the same name assuming compaction isn't finished yet and there still are SSTables sitting in level 0? Or do queries return only the columns with the most recent timestamp for each name regardless? (I would guess at the latter but I wanted to make sure) On Oct 24, 2013 7:08 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Oct 24, 2013 at 3:13 PM, rash aroskar rashmi.aros...@gmail.comwrote: I have a column family created with strategy of leveled compaction. If I execute nodetool compact command, will the columnfamily be compacted using size tiered compaction strategy? No. If yes, after the major size tiered compaction finishes will it at any point trigger leveled compaction back on the column family? https://issues.apache.org/jira/browse/CASSANDRA-6092 and especially : https://issues.apache.org/jira/browse/CASSANDRA-6092?focusedCommentId=13793012page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13793012 tl;dr : if you have more than a single SSTable, it will do something. There is disagreement whether it should do something if you only have a single SSTable. My view, for the record, is yes. =Rob
Re: what does nodetool compact command do for leveled compactions?
On Thu, Oct 24, 2013 at 4:58 PM, Jayadev Jayaraman jdisal...@gmail.comwrote: Or do queries return only the columns with the most recent timestamp for each name regardless? (I would guess at the latter but I wanted to make sure) This. =Rob