Re: CompositeType for row Keys
If you are using OPP, then you can use CompositeType on both key and column name; otherwise(Random Partition), just use it for columns. On 22/07/2011 17:10, Patrick Julien wrote: With the current implementation of CompositeType in Cassandra 0.8.1, is it recommended practice to try to use a CompositeType as the key? Or are both, column and key, equally well supported? The documentation on CompositeType is light, well non-existent really, with key_validation_class set to CompositeType (UUIDType, IntegerType) can we query all matching rows just by using CompositeType(UUIDType)? In my specific use case, what would work best is to have a composite key that is a CompositeType with thousands of columns each. -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: CompositeType for row Keys
On 22/07/2011 17:56, Patrick Julien wrote: I can still use it for keys if I don't need ranges then? Because for what we are doing we can always re-assemble keys yes,but why would you use CompositeType if you don't need range query? On Fri, Jul 22, 2011 at 11:38 AM, Donal Zangzan...@ihep.ac.cn wrote: If you are using OPP, then you can use CompositeType on both key and column name; otherwise(Random Partition), just use it for columns. On 22/07/2011 17:10, Patrick Julien wrote: With the current implementation of CompositeType in Cassandra 0.8.1, is it recommended practice to try to use a CompositeType as the key? Or are both, column and key, equally well supported? The documentation on CompositeType is light, well non-existent really, with key_validation_class set to CompositeType (UUIDType, IntegerType) can we query all matching rows just by using CompositeType(UUIDType)? In my specific use case, what would work best is to have a composite key that is a CompositeType with thousands of columns each. -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018 -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: [SPAM] Fwd: Counter consistency - are counters idempotent?
On 22/07/2011 18:08, Yang wrote: btw, this issue of not knowing whether a write is persisted or not when client reports error, is not limited to counters, for regular columns, it's the same: if client reports write failure, the value may well be replicated to all replicas later. this is even the same with all other systems: Zookeeper, Paxos, ultimately due to the FLP theoretical result of no guarantee of consensus in async systems yes, but with regular columns, retry is OK, while counter is not. -- Forwarded message -- From: Sylvain Lebresnesylv...@datastax.com Date: Fri, Jul 22, 2011 at 8:03 AM Subject: Re: Counter consistency - are counters idempotent? To: user@cassandra.apache.org On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yukenny...@knewton.com wrote: As of Cassandra 0.8.1, are counter increments and decrements idempotent? If, for example, a client sends an increment request and the increment occurs, but the network subsequently fails and reports a failure to the client, will Cassandra retry the increment (thus leading to an overcount and inconsistent data)? I have done some reading and I am getting conflicting sources about counter consistency. In this source (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html), it states that counters now have the same consistency as regular columns--does this imply that the above example will not lead to an overcount? That email thread was arguably a bit imprecise with its use of the word 'consistency' but what it was talking about is really consistency level. That is, counter supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM) excepted ANY. Counter are still not idempotent. And just a small precision, if you get a TimeoutException, Cassandra never retry the increment on it's own (your sentence suggests it does), but you won't know in that case if the increment was persisted or not, and thus you won't know if you should retry or not. And yes, this is still a limitation of counters. If counters are not idempotent, are there examples of effective uses of counters that will prevent inconsistent counts? Thank you for your help. -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: Counter Column
On 27/06/2011 19:19, Sylvain Lebresne wrote: Let me make that simpler. Don't ever use replicate_on_write=false (even if you think that it is what you want, there is a good chance it's not). Obviously, the default is replicate_on_write=true. I may be wrong. But with 0.8.0, I think the default is replicate_on_write=false, you have to declare it explicitly. -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: Counter Column
On 27/06/2011 17:04, Artem Orobets wrote: Hi! As I know, we use counter column only with replication factor ALL, so is it mean that we can't read data while any replica will fail? you can use any consistency level, using replicate_on_write=true when create the counter column family. -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: Querying superColumn
Well, you are looking for the secondary index. But for now,AFAIK, the supercolumn can not use secondary index . On 16/06/2011 13:55, Vivek Mishra wrote: Now for rowKey 'DEPT1' I have inserted multiple super column like: *Employee1{* *Name: Vivek* *country: India* *}* ** *Employee2{* *Name: Vivs* *country: USA* *}* Now if I want to retrieve a super column whose rowkey is 'DEPT1' and employee name is 'Vivek'. Can I get only 'EMPLOYEE1' ? -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: insert slowdown with secondary indexes
On 11/06/2011 02:27, jodylandren...@comcast.net wrote: I'm trying to understand why doing the inserts into a column family with indexes seems to jam things up and am wondering if there are any settings that I could tweak to help. It seems that the 4 node cluster should be able to handle 2 threads of data coming at it. Has anyone had any experience with this number of indexes per column family? Any insight or suggestions would be appreciated. Hi, I used to post an email about this, see the mail list archive. The secondary index now use hash method, and it causes an random I/O when do insertion(so lots of swap work). Also, the query based on it would be slow too. So my advice would be : don't use the secondary index, at least for now (there are plans to build an bitmap index [1]) You can try Ed Uff 's method [2] to build an CF as your index, it's much faster than the secondary index. (this method may need the CompositeType [3]) [1] https://issues.apache.org/jira/browse/CASSANDRA-1472 [2] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html [3] https://issues.apache.org/jira/browse/CASSANDRA-2231 -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: [SPAM] Re: slow insertion rate with secondary index
On 06/06/2011 10:15, David Boxenhorn wrote: Is there really a 10x difference between indexed CFs and non-indexed CFs? Well, as for my test, it is! I'm using 0.7.6-2, 9 nodes, 3 replicas, write_consistency_level QUORUM, about 90,000,000 rows (~ 1K per row) I use 20 process, 20rows for each insertion. the insertion time for the whole row is about 0.02 seconds without index and then I add a secondary index, and update every row with the indexed column, the insertion time is about 2 seconds and if I remove the index, and update the column, the time is about 0.002 Another thing I noticed is : if you first do insertion, and then build the secondary index use update column family ..., and then do select based on the index, the result is not right (seems the index is still being built though the update commands returns quickly). And after a while, the get_indexed_slices() goes time out from time to time (with pycassa.ConnectionPool('keyspace1', ['host1','host2'], timeout=600, pool_size=1) ). Does some one else have some same experiences using the secondary indexes? -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
slow insertion rate with secondary index
I did a insertion test with and without secondary indexes, and found that: Without secondary index: ~10864 rows inserted per second With secondary index on one column(BytesType): ~1515 rows inserted per second Is this normal? why secondary index would have so much affect? I noticed that If I build the index using “update column family ...” after I inserted all data (90578207 rows) , It will finish very quickly. I'm not very clear about how the secondary index works, will some one explain this ? Thanks! Donal
get_indexed_slices count api
Hi, I'm query on cassandra like select count(*) from table where column1 = v1 and ..., based on a secondary index on column1. But using get_indexed_slices(), I have to fetch all the rows and count on them, which is not needed. So a get_indexed_slices count api [1] would be very helpful, but it seems no one is working on this now. (I see it's related to [2], which is blocked by [3]) My question is : Will get_indexed_slices count api be provided? Or I should doing some CF-based indexes by myself like http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html? Thanks! Donal https://issues.apache.org/jira/browse/CASSANDRA-2601 [1] https://issues.apache.org/jira/browse/CASSANDRA-1600 [2] https://issues.apache.org/jira/browse/CASSANDRA-1034 [3]
when 0.8.1 will be released?
Hi, Is there a time planed for 0.8.1 release? I want to use the CompositeType comparer : https://issues.apache.org/jira/browse/CASSANDRA-2231 Thanks! Donal
statistcs query on cassandra
Can we do count like this? /count cf[startKey:endKey] where column = value/ -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
NullPointerException with 0.7.4
Hi, I'm doing a stress test, and cassandra crashed with this Exception: ERROR [MutationStage:9] 2011-04-03 21:11:50,152 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.NullPointerException at org.apache.cassandra.io.sstable.IndexSummary$KeyPosition.compareTo(IndexSummary.java:100) at org.apache.cassandra.io.sstable.IndexSummary$KeyPosition.compareTo(IndexSummary.java:87) at java.util.Collections.indexedBinarySearch(Collections.java:232) at java.util.Collections.binarySearch(Collections.java:218) at org.apache.cassandra.io.sstable.SSTableReader.getIndexScanPosition(SSTableReader.java:333) at org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:459) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:563) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:61) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:58) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1353) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1245) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1173) at org.apache.cassandra.db.Table.readCurrentIndexedColumns(Table.java:459) at org.apache.cassandra.db.Table.apply(Table.java:394) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:76) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- Donal Zang CERN PH-ADP-DDM 40-3-D16 CH-1211 Geneve 23 donal.z...@cern.ch +41 22 76 71268
apache.org down?
rt -- Donal Zang
Re: Cassandra automatic startup script on ubuntu
On 20/01/2011 17:51, Sébastien Druon wrote: Hello! I am using cassandra on a ubuntu machine and installed it from the binary found on the cassandra home page. However, I did not find any scripts to start it up at boot time. Where can I find this kind of script? Thanks a lot in advance Sebastien Hi, this is what I do, you can add the watchdog to rc.local /%S[%m]%s %~ %# cat watchdog #!/bin/bash # # This script is to check every $INTERVAL seconds to see # whether cassandra is work well # and restart it if neccesary # by donal 2010-01-11 # PORT=9160 INTERVAL=2 CASSANDRA=/opt/cassandra check() { netstat -tln|grep LISTEN|grep :$1 if [ $? != 0 ]; then echo restarting cassandra $CASSANDRA/bin/stop-server sleep 1 $CASSANDRA/bin/start-server fi } while true do check $PORT sleep $INTERVAL done/
should nodetool repair run periodic to keep consistency?
Just to ensure. So this should be done manually by the cluster operators? Thanks! --
Re: Between Clause
On 17/01/2011 11:55, kh jo wrote: What is the best way to model a query with between clause.. given that you have a large number of entries... thanks Jo In my experience,for the row based 'between clause' with a random partition, you should design the column family carefully, So that you can get all the rows' key. In this case you can use a multi_get() instead of get_range(), and you can do get_range() between columns in a row. --
Can super column family use column_metadata?
Hi, I'm using 0.7.0-rc1,and when I use cassandra-cli to create a column family with metadata, I got null,and no column family is created. The command I use is: /create keyspace test; use test; create column family test1 with column_type = 'Super' and comparator = 'LongType' and column_metadata =[{column_name:a,validation_class:LongType}];/ And also the examples gived by help create column family; won't work! Any ideas? Thanks!
how to see how many rows in each node?
RT. Is there any command or api? Thanks!