Re: Memory Usage of a connection
On Fri, Aug 31, 2012 at 11:27 AM, Peter Schuller peter.schul...@infidyne.com wrote: Could these 500 connections/second cause (on average) 2600Mb memory usage per 2 second ~ 1300Mb/second. or For 1 connection around 2-3Mb. In terms of garbage generated it's much less about number of connections as it is about what you're doing with them. Are you for example requesting large amounts of data? Large or many columns (or both), etc. Essentially all working data that your request touches is allocated on the heap and contributes to allocation rate and ParNew frequency. write requests are simple counter increments and in memtables existing in memory. There is negligible read traffic (100/200 reads/second). Also, increasing write traffic si the one that increases gc frequency while keeping read traffic constant. So the gc should be independent of reads. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: How to set LeveledCompactionStrategy for an existing table
Hello Aaron. Thanks for your answer Jira ticket 4597 created : https://issues.apache.org/jira/browse/CASSANDRA-4597 Jean-Armel 2012/8/31 aaron morton aa...@thelastpickle.com Looks like a bug. Can you please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA and update the email thread ? Can you include this: CFPropDefs.applyToCFMetadata() does not set the compaction class on CFM Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/08/2012, at 7:05 AM, Jean-Armel Luce jaluc...@gmail.com wrote: I tried as you said with cassandra-cli, and still unsuccessfully [default@unknown] use test1; Authenticated to keyspace: test1 [default@test1] UPDATE COLUMN FAMILY pns_credentials with compaction_strategy='LeveledCompactionStrategy'; 8ed12919-ef2b-327f-8f57-4c2de26c9d51 Waiting for schema agreement... ... schemas agree across the cluster And then, when I check the compaction strategy, it is still SizeTieredCompactionStrategy [default@test1] describe pns_credentials; ColumnFamily: pns_credentials Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 0.1 DC Local Read repair chance: 0.0 Replicate on write: true Caching: KEYS_ONLY Bloom Filter FP chance: default Built indexes: [] Column Metadata: Column Name: isnew Validation Class: org.apache.cassandra.db.marshal.Int32Type Column Name: ts Validation Class: org.apache.cassandra.db.marshal.DateType Column Name: mergestatus Validation Class: org.apache.cassandra.db.marshal.Int32Type Column Name: infranetaccount Validation Class: org.apache.cassandra.db.marshal.UTF8Type Column Name: user_level Validation Class: org.apache.cassandra.db.marshal.Int32Type Column Name: msisdn Validation Class: org.apache.cassandra.db.marshal.LongType Column Name: mergeusertype Validation Class: org.apache.cassandra.db.marshal.Int32Type Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Compression Options: sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor I tried also to create a new table with LeveledCompactionStrategy (using cqlsh), and when I check the compaction strategy, the SizeTieredCompactionStrategy is set for this table. cqlsh:test1 CREATE TABLE pns_credentials3 ( ... ise text PRIMARY KEY, ... isnew int, ... ts timestamp, ... mergestatus int, ... infranetaccount text, ... user_level int, ... msisdn bigint, ... mergeusertype int ... ) WITH ... comment='' AND ... read_repair_chance=0.10 AND ... gc_grace_seconds=864000 AND ... compaction_strategy_class='LeveledCompactionStrategy' AND ... compression_parameters:sstable_compression='SnappyCompressor'; cqlsh:test1 describe table pns_credentials3 CREATE TABLE pns_credentials3 ( ise text PRIMARY KEY, isnew int, ts timestamp, mergestatus int, infranetaccount text, user_level int, msisdn bigint, mergeusertype int ) WITH comment='' AND comparator=text AND read_repair_chance=0.10 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write='true' AND compaction_strategy_class='SizeTieredCompactionStrategy' AND compression_parameters:sstable_compression='SnappyCompressor'; Maybe something is wrong in my server. Any idea ? Thanks. Jean-Armel 2012/8/30 feedly team feedly...@gmail.com in cassandra-cli, i did something like: update column family xyz with compaction_strategy='LeveledCompactionStrategy' On Thu, Aug 30, 2012 at 5:20 AM, Jean-Armel Luce jaluc...@gmail.comwrote: Hello, I am using Cassandra 1.1.1 and CQL3. I have a cluster with 1 node (test environment) Could you tell how to set the compaction strategy to Leveled Strategy for an existing table ? I have a table pns_credentials jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3 Connected to Test Cluster at localhost:9160. [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0] Use HELP for help. cqlsh use test1; cqlsh:test1 describe table pns_credentials; CREATE TABLE pns_credentials ( ise text PRIMARY KEY, isnew int, ts timestamp, mergestatus int, infranetaccount text, user_level int, msisdn bigint, mergeusertype int ) WITH comment='' AND
Re: Store a timeline with uniques properties
Hi Aaron, That's great news... Would you know the name of this feature so I can look further into it ? Thanks, Morgan. Le 31 août 2012 à 06:05, aaron morton aa...@thelastpickle.com a écrit : Consider trying… UserTimeline CF row_key: user_id column_names: timestamp, other_user_id, action column_values: action details To get the changes between two times specify the start and end timestamps and do not include the other components of the column name. e.g. from 1234, NULL, NULL to 6789, NULL, NULL Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/08/2012, at 11:32 PM, Morgan Segalis msega...@gmail.com wrote: Sorry for the scheme that has not keep the right tabulation for some people... Here's a space-version instead of a tabulation. user1 row :| lte| lte -1| lte -2| lte -3 | lte -4 | values :| user2-name-change | user3-pic-change | user4-status-change | user2-pic-change | user2-status-change | If for example, user2 changes it's picture, the row should look like that : user1 row :|lte | lte -1 | lte -2 | lte -3 | lte -4| values : | user2-pic-change| user2-name-change | user3-pic-change | user4-status-change | user2-status-change | Le 30 août 2012 à 13:22, Morgan Segalis a écrit : Hi everyone, I'm trying to use cassandra in order to store a timeline, but with values that must be unique (replaced). (So not really a timeline, but didn't find a better word for it) Let's me give you an example : - An user have a list of friends - Friends can change their nickname, status, profile picture, etc... at the beginning the CF will look like that for user1: lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 means that the timestamp are older) user1 row : | lte | lte -1 | lte -2 | lte -3 | lte -4 | values :| user2-name-change | user3-pic-change | user4-status-change | user2-pic-change| user2-status-change | If for example, user2 changes it's picture, the row should look like that : user1 row : | lte | lte -1 | lte -2 | lte -3 | lte -4 | values :| user2-pic-change| user2-name-change | user3-pic-change | user4-status-change | user2-status-change | notice that user2-pic-change in the first representation (lte -3) has moved to the (lte) on the second representation. That way when user1 connects again, It can retrieve only informations that occurred between the last time he connected. e.g. : if the user1's last connexion date it between lte -2 and lte -3, then he will only be notified that : - user2 has changed his picture - user2 has changed his name - user3 has changed his picture I would not keep the old data since the timeline is saved locally on the client, and not on the server. I really would like not to search for each column in order to find the user2-pic-change, that can be long especially if the user has many friends. Is there a simple way to do that with cassandra, or I am bound to create another CF, with column title holding the action e.g. user2-pic-change and for value the timestamp when it appears ? Thanks, Morgan.
Re: Store a timeline with uniques properties
Nevermind, it is called composite columns. Thank you for your help. Morgan. Le 31 août 2012 à 06:05, aaron morton aa...@thelastpickle.com a écrit : Consider trying… UserTimeline CF row_key: user_id column_names: timestamp, other_user_id, action column_values: action details To get the changes between two times specify the start and end timestamps and do not include the other components of the column name. e.g. from 1234, NULL, NULL to 6789, NULL, NULL Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/08/2012, at 11:32 PM, Morgan Segalis msega...@gmail.com wrote: Sorry for the scheme that has not keep the right tabulation for some people... Here's a space-version instead of a tabulation. user1 row :| lte| lte -1| lte -2| lte -3 | lte -4 | values :| user2-name-change | user3-pic-change | user4-status-change | user2-pic-change | user2-status-change | If for example, user2 changes it's picture, the row should look like that : user1 row :|lte | lte -1 | lte -2 | lte -3 | lte -4| values : | user2-pic-change| user2-name-change | user3-pic-change | user4-status-change | user2-status-change | Le 30 août 2012 à 13:22, Morgan Segalis a écrit : Hi everyone, I'm trying to use cassandra in order to store a timeline, but with values that must be unique (replaced). (So not really a timeline, but didn't find a better word for it) Let's me give you an example : - An user have a list of friends - Friends can change their nickname, status, profile picture, etc... at the beginning the CF will look like that for user1: lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 means that the timestamp are older) user1 row : | lte | lte -1 | lte -2 | lte -3 | lte -4 | values :| user2-name-change | user3-pic-change | user4-status-change | user2-pic-change| user2-status-change | If for example, user2 changes it's picture, the row should look like that : user1 row : | lte | lte -1 | lte -2 | lte -3 | lte -4 | values :| user2-pic-change| user2-name-change | user3-pic-change | user4-status-change | user2-status-change | notice that user2-pic-change in the first representation (lte -3) has moved to the (lte) on the second representation. That way when user1 connects again, It can retrieve only informations that occurred between the last time he connected. e.g. : if the user1's last connexion date it between lte -2 and lte -3, then he will only be notified that : - user2 has changed his picture - user2 has changed his name - user3 has changed his picture I would not keep the old data since the timeline is saved locally on the client, and not on the server. I really would like not to search for each column in order to find the user2-pic-change, that can be long especially if the user has many friends. Is there a simple way to do that with cassandra, or I am bound to create another CF, with column title holding the action e.g. user2-pic-change and for value the timestamp when it appears ? Thanks, Morgan.
force gc?
Hi All! I have a problem with using cassandra. Our application does a lot of overwrites and deletes. If I understand correctly cassandra does not actually delete these objects until gc_grace seconds have passed. I tried to force gc by setting gc_grace to 0 on an existing column family and running major compaction afterwards. However I did not get disk space back, although I'm pretty much sure that my column family should occupy many times fewer space. We have also a PostgreSQL db and we duplicate each operation with data in both dbs. And the PosgreSQL table is much more smaller than the corresponding cassandra's column family. Does anyone have any suggestions on how can I analyze my problem? Or maybe I'm doing something wrong and there is another way to force gc on an existing column family. Thanks in advance, Alexander
Re: force gc?
Cassandra at least used to do disc cleanup as a side effect of garbage collection through finalizers. (This is a mistake for the reason outlined below.) It is important to understand that you can *never* force* a gc in java. Even calling System.gc() is merely a hint to the VM. What you are doing is telling the VM that you are * willing* to give up some processor time right now to gc, how much it choses to actually collect or not collect is totally up to the VM. The *only* garbage collection guarantee in java is that it will make a best effort to collect what it can to avoid an out of memory exception at the time that it runs out of memory. You are not guaranteed when *if ever*, a given object will actually be collected. Since finalizers happen when an object is collected, and not when it becomes a candidate for collection, the same is true of the finalizer. You are not guaranteed when, if ever, it will run. On Fri, Aug 31, 2012 at 9:03 AM, Alexander Shutyaev shuty...@gmail.comwrote: Hi All! I have a problem with using cassandra. Our application does a lot of overwrites and deletes. If I understand correctly cassandra does not actually delete these objects until gc_grace seconds have passed. I tried to force gc by setting gc_grace to 0 on an existing column family and running major compaction afterwards. However I did not get disk space back, although I'm pretty much sure that my column family should occupy many times fewer space. We have also a PostgreSQL db and we duplicate each operation with data in both dbs. And the PosgreSQL table is much more smaller than the corresponding cassandra's column family. Does anyone have any suggestions on how can I analyze my problem? Or maybe I'm doing something wrong and there is another way to force gc on an existing column family. Thanks in advance, Alexander -- It's always darkest just before you are eaten by a grue.
Composite row keys with SSTableSimpleUnsortedWriter for Cassandra 1.0?
Hello: I'm using DataStax Enterprise 2.1, which is based on Cassandra 1.0.10 from what I can tell. For my project, I perform a content build that generates a number of SSTables using SSTableSimpleUnsortedWriter. These are loaded using either JMX or sstableloader depending on the environment. I want to introduce a composite row key into some of the generated SSTables. Also, I will be referring to these keys by using composite column names. I can define the desired composite time and provide it to the SSTableSimpleUnsortedWriter constructor: ListAbstractType? compositeList = new ArrayListAbstractType?(); compositeList.add(UTF8Type.instance) compositeList.add(UTF8Type.instance) compositeUtf8Utf8Type = CompositeType.getInstance(compositeList) articleWriter = new SSTableSimpleUnsortedWriter( cassandraOutputDir, IngenuityContent, Articles, compositeUtf8Utf8Type, null, 64) I then figured I could use compositeUtf8Utf8Type when creating composite row keys and column names of the kind I require. Cassandra 1.1.x introduces the CompositeType.Builder class for creating actual composite values, but that's not available to me. I've also seen examples of using Hector's Composite to create composite values. But, I need to create these values using the various classes within Cassandra 1.0 itself to work with SSTableSimpleUnsortedWriter. For that, I'm not finding any examples on how one does that. As far as I can tell, composite columns at least have been around since Cassandra 0.8.x? Is there the support I need in Cassandra 1.0.x? Many thanks! Jeff -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068
Cassandra and Apache Drill
Like a lot of folks, I have a need for Big Data and fast queries on that data. Hive queries against Cassandra functionally meet my requirements, but the job oriented processing is too slow when you need to execute many queries on a small portion of the data. It seems like Apache Drill might be the right answer to this problem. I see HBase mentioned as a possible integration point with Drill, but no mention of Cassandra. Has anyone taken a look at Drill to see how it could access the data in Cassandra? -John
Re: Cassandra and Apache Drill
I don't think Drill has been accepted into the incubator yet or has any code. If/When that happens then it's entirely possible Cassandra could be integrated. On Fri, Aug 31, 2012 at 4:29 PM, John Onusko jonu...@actiance.com wrote: Like a lot of folks, I have a need for Big Data and fast queries on that data. Hive queries against Cassandra functionally meet my requirements, but the job oriented processing is too slow when you need to execute many queries on a small portion of the data. It seems like Apache Drill might be the right answer to this problem. I see HBase mentioned as a possible integration point with Drill, but no mention of Cassandra. Has anyone taken a look at Drill to see how it could access the data in Cassandra? ** ** -John -- http://twitter.com/tjake
Re: adding node to cluster
On Thu, Aug 30, 2012 at 10:39 PM, Casey Deccio ca...@deccio.net wrote: In what way are the lookups failing? Is there an exception? No exception--just failing in that the data should be there, but isn't. At ConsistencyLevel.ONE or QUORUM? If you are bootstrapping the node, I would expect there to be no chance of serving blank reads like this. As auto_bootstrap is set to true by default, I presume you are bootstrapping. Which node are you querying to get the no data response? =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb