Re: Memory Usage of a connection

2012-08-31 Thread rohit bhatia
On Fri, Aug 31, 2012 at 11:27 AM, Peter Schuller 
peter.schul...@infidyne.com wrote:

  Could these 500 connections/second cause (on average) 2600Mb memory usage
  per 2 second ~ 1300Mb/second.
  or For 1 connection around 2-3Mb.

 In terms of garbage generated it's much less about number of
 connections as it is about what you're doing with them. Are you for
 example requesting large amounts of data? Large or many columns (or
 both), etc. Essentially all working data that your request touches
 is allocated on the heap and contributes to allocation rate and ParNew
 frequency.


write requests are simple counter increments and in memtables existing in
memory.
There is negligible read traffic (100/200 reads/second).
Also, increasing write traffic si the one that increases gc frequency while
keeping read traffic constant.
So the gc should be independent of reads.


 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)



Re: How to set LeveledCompactionStrategy for an existing table

2012-08-31 Thread Jean-Armel Luce
Hello Aaron.

Thanks for your answer

Jira ticket 4597 created :
https://issues.apache.org/jira/browse/CASSANDRA-4597

Jean-Armel

2012/8/31 aaron morton aa...@thelastpickle.com

 Looks like a bug.

 Can you please create a ticket on
 https://issues.apache.org/jira/browse/CASSANDRA and update the email
 thread ?

 Can you include this: CFPropDefs.applyToCFMetadata() does not set the
 compaction class on CFM

 Thanks


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/08/2012, at 7:05 AM, Jean-Armel Luce jaluc...@gmail.com wrote:

 I tried as you said with cassandra-cli, and still unsuccessfully

 [default@unknown] use test1;
 Authenticated to keyspace: test1
 [default@test1] UPDATE COLUMN FAMILY pns_credentials with
 compaction_strategy='LeveledCompactionStrategy';
 8ed12919-ef2b-327f-8f57-4c2de26c9d51
 Waiting for schema agreement...
 ... schemas agree across the cluster

 And then, when I check the compaction strategy, it is still
 SizeTieredCompactionStrategy
 [default@test1] describe pns_credentials;
 ColumnFamily: pns_credentials
   Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Default column value validator:
 org.apache.cassandra.db.marshal.UTF8Type
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 0.1
   DC Local Read repair chance: 0.0
   Replicate on write: true
   Caching: KEYS_ONLY
   Bloom Filter FP chance: default
   Built indexes: []
   Column Metadata:
 Column Name: isnew
   Validation Class: org.apache.cassandra.db.marshal.Int32Type
 Column Name: ts
   Validation Class: org.apache.cassandra.db.marshal.DateType
 Column Name: mergestatus
   Validation Class: org.apache.cassandra.db.marshal.Int32Type
 Column Name: infranetaccount
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Column Name: user_level
   Validation Class: org.apache.cassandra.db.marshal.Int32Type
 Column Name: msisdn
   Validation Class: org.apache.cassandra.db.marshal.LongType
 Column Name: mergeusertype
   Validation Class: org.apache.cassandra.db.marshal.Int32Type
   Compaction Strategy:
 org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
   Compression Options:
 sstable_compression:
 org.apache.cassandra.io.compress.SnappyCompressor



 I tried also to create a new table with LeveledCompactionStrategy (using
 cqlsh), and when I check the compaction strategy, the
 SizeTieredCompactionStrategy is set for this table.

 cqlsh:test1 CREATE TABLE pns_credentials3 (
  ...   ise text PRIMARY KEY,
  ...   isnew int,
  ...   ts timestamp,
  ...   mergestatus int,
  ...   infranetaccount text,
  ...   user_level int,
  ...   msisdn bigint,
  ...   mergeusertype int
  ... ) WITH
  ...   comment='' AND
  ...   read_repair_chance=0.10 AND
  ...   gc_grace_seconds=864000 AND
  ...   compaction_strategy_class='LeveledCompactionStrategy' AND
  ...
 compression_parameters:sstable_compression='SnappyCompressor';
 cqlsh:test1 describe table pns_credentials3

 CREATE TABLE pns_credentials3 (
   ise text PRIMARY KEY,
   isnew int,
   ts timestamp,
   mergestatus int,
   infranetaccount text,
   user_level int,
   msisdn bigint,
   mergeusertype int
 ) WITH
   comment='' AND
   comparator=text AND
   read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   default_validation=text AND
   min_compaction_threshold=4 AND
   max_compaction_threshold=32 AND
   replicate_on_write='true' AND
   compaction_strategy_class='SizeTieredCompactionStrategy' AND
   compression_parameters:sstable_compression='SnappyCompressor';

 Maybe something is wrong in my server.
 Any idea ?

 Thanks.
 Jean-Armel


 2012/8/30 feedly team feedly...@gmail.com

 in cassandra-cli, i did something like:

 update column family xyz with
 compaction_strategy='LeveledCompactionStrategy'


 On Thu, Aug 30, 2012 at 5:20 AM, Jean-Armel Luce jaluc...@gmail.comwrote:


 Hello,

 I am using Cassandra 1.1.1 and CQL3.
 I have a cluster with 1 node (test environment)
 Could you tell how to set the compaction strategy to Leveled Strategy
 for an existing table ?

 I have a table pns_credentials

 jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
 Connected to Test Cluster at localhost:9160.
 [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol
 19.32.0]
 Use HELP for help.
 cqlsh use test1;
 cqlsh:test1 describe table pns_credentials;

 CREATE TABLE pns_credentials (
   ise text PRIMARY KEY,
   isnew int,
   ts timestamp,
   mergestatus int,
   infranetaccount text,
   user_level int,
   msisdn bigint,
   mergeusertype int
 ) WITH
   comment='' AND
   

Re: Store a timeline with uniques properties

2012-08-31 Thread Morgan Segalis
Hi Aaron,

That's great news... Would you know the name of this feature so I can look 
further into it ?

Thanks,

Morgan. 

Le 31 août 2012 à 06:05, aaron morton aa...@thelastpickle.com a écrit :

 Consider trying…
 
 UserTimeline CF
 
 row_key: user_id
 column_names: timestamp, other_user_id, action
 column_values: action details
 
 To get the changes between two times specify the start and end timestamps and 
 do not include the other components of the column name. 
 
 e.g. from 1234, NULL, NULL to 6789, NULL, NULL
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 30/08/2012, at 11:32 PM, Morgan Segalis msega...@gmail.com wrote:
 
 Sorry for the scheme that has not keep the right tabulation for some 
 people...
 Here's a space-version instead of a tabulation.
 
 user1 row :|   lte|  
 lte -1|   lte -2|  
 lte -3   |   lte -4   |
  values :| user2-name-change | user3-pic-change   | 
 user4-status-change | user2-pic-change | user2-status-change |
 
 If for example, user2 changes it's picture, the row should look like that : 
 
 user1 row :|lte   |  
  lte -1   |   lte -2   | 
lte -3  |  lte -4|
values :  |   user2-pic-change| user2-name-change 
 | user3-pic-change   | user4-status-change | user2-status-change |
 
 Le 30 août 2012 à 13:22, Morgan Segalis a écrit :
 
 Hi everyone,
 
 I'm trying to use cassandra in order to store a timeline, but with values 
 that must be unique (replaced). (So not really a timeline, but didn't find 
 a better word for it)
 
 Let's me give you an example :
 
 - An user have a list of friends
 - Friends can change their nickname, status, profile picture, etc...
 
 at the beginning the CF will look like that for user1: 
 
 lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 
 means that the timestamp are older)
 
 user1 row : |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4  |
 values :| user2-name-change | user3-pic-change  
 | user4-status-change | user2-pic-change| user2-status-change |
 
 If for example, user2 changes it's picture, the row should look like that : 
 
 user1 row : |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4   |
 values :|   user2-pic-change| 
 user2-name-change | user3-pic-change  | user4-status-change | 
 user2-status-change |
 
 notice that user2-pic-change in the first representation (lte -3) has 
 moved to the (lte) on the second representation.
 
 That way when user1 connects again, It can retrieve only informations that 
 occurred between the last time he connected.
 
 e.g. : if the user1's last connexion date it between lte -2 and lte -3, 
 then he will only be notified that :
 
 - user2 has changed his picture
 - user2 has changed his name
 - user3 has changed his picture
 
 I would not keep the old data since the timeline is saved locally on the 
 client, and not on the server.
 I really would like not to search for each column in order to find the 
 user2-pic-change, that can be long especially if the user has many 
 friends.
 
 Is there a simple way to do that with cassandra, or I am bound to create 
 another CF, with column title holding the action e.g. user2-pic-change 
 and for value the timestamp when it appears ?
 
 Thanks,
 
 Morgan.
 


Re: Store a timeline with uniques properties

2012-08-31 Thread Morgan Segalis
Nevermind, it is called composite columns. 

Thank you for your help. 

Morgan. 

Le 31 août 2012 à 06:05, aaron morton aa...@thelastpickle.com a écrit :

 Consider trying…
 
 UserTimeline CF
 
 row_key: user_id
 column_names: timestamp, other_user_id, action
 column_values: action details
 
 To get the changes between two times specify the start and end timestamps and 
 do not include the other components of the column name. 
 
 e.g. from 1234, NULL, NULL to 6789, NULL, NULL
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 30/08/2012, at 11:32 PM, Morgan Segalis msega...@gmail.com wrote:
 
 Sorry for the scheme that has not keep the right tabulation for some 
 people...
 Here's a space-version instead of a tabulation.
 
 user1 row :|   lte|  
 lte -1|   lte -2|  
 lte -3   |   lte -4   |
  values :| user2-name-change | user3-pic-change   | 
 user4-status-change | user2-pic-change | user2-status-change |
 
 If for example, user2 changes it's picture, the row should look like that : 
 
 user1 row :|lte   |  
  lte -1   |   lte -2   | 
lte -3  |  lte -4|
values :  |   user2-pic-change| user2-name-change 
 | user3-pic-change   | user4-status-change | user2-status-change |
 
 Le 30 août 2012 à 13:22, Morgan Segalis a écrit :
 
 Hi everyone,
 
 I'm trying to use cassandra in order to store a timeline, but with values 
 that must be unique (replaced). (So not really a timeline, but didn't find 
 a better word for it)
 
 Let's me give you an example :
 
 - An user have a list of friends
 - Friends can change their nickname, status, profile picture, etc...
 
 at the beginning the CF will look like that for user1: 
 
 lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 
 means that the timestamp are older)
 
 user1 row : |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4  |
 values :| user2-name-change | user3-pic-change  
 | user4-status-change | user2-pic-change| user2-status-change |
 
 If for example, user2 changes it's picture, the row should look like that : 
 
 user1 row : |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4   |
 values :|   user2-pic-change| 
 user2-name-change | user3-pic-change  | user4-status-change | 
 user2-status-change |
 
 notice that user2-pic-change in the first representation (lte -3) has 
 moved to the (lte) on the second representation.
 
 That way when user1 connects again, It can retrieve only informations that 
 occurred between the last time he connected.
 
 e.g. : if the user1's last connexion date it between lte -2 and lte -3, 
 then he will only be notified that :
 
 - user2 has changed his picture
 - user2 has changed his name
 - user3 has changed his picture
 
 I would not keep the old data since the timeline is saved locally on the 
 client, and not on the server.
 I really would like not to search for each column in order to find the 
 user2-pic-change, that can be long especially if the user has many 
 friends.
 
 Is there a simple way to do that with cassandra, or I am bound to create 
 another CF, with column title holding the action e.g. user2-pic-change 
 and for value the timestamp when it appears ?
 
 Thanks,
 
 Morgan.
 


force gc?

2012-08-31 Thread Alexander Shutyaev
Hi All!

I have a problem with using cassandra. Our application does a lot of
overwrites and deletes. If I understand correctly cassandra does not
actually delete these objects until gc_grace seconds have passed. I tried
to force gc by setting gc_grace to 0 on an existing column family and
running major compaction afterwards. However I did not get disk space back,
although I'm pretty much sure that my column family should occupy many
times fewer space. We have also a PostgreSQL db and we duplicate each
operation with data in both dbs. And the PosgreSQL table is much more
smaller than the corresponding cassandra's column family. Does anyone have
any suggestions on how can I analyze my problem? Or maybe I'm doing
something wrong and there is another way to force gc on an existing column
family.

Thanks in advance,
Alexander


Re: force gc?

2012-08-31 Thread Jeffrey Kesselman
Cassandra at least used to do disc cleanup as a side effect of
garbage collection through finalizers.  (This is a mistake for the
reason outlined below.)

It is important to understand that you can *never* force* a gc in java.
Even calling System.gc() is merely a hint to the VM. What you are doing is
telling the VM that you are * willing* to give up some processor time right
now to gc, how much it choses to actually collect or not collect is totally
up to the VM.

The *only* garbage collection guarantee in java is that it will make a
best effort to collect what it can to avoid an out of memory exception at
the time that it runs out of memory.  You are not guaranteed when *if
ever*, a given object will actually be collected.  Since finalizers happen
when an object is collected, and not when it becomes a candidate for
collection, the same is true of the finalizer.  You are
not guaranteed when, if ever, it will run.

On Fri, Aug 31, 2012 at 9:03 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Hi All!

 I have a problem with using cassandra. Our application does a lot of
 overwrites and deletes. If I understand correctly cassandra does not
 actually delete these objects until gc_grace seconds have passed. I tried
 to force gc by setting gc_grace to 0 on an existing column family and
 running major compaction afterwards. However I did not get disk space back,
 although I'm pretty much sure that my column family should occupy many
 times fewer space. We have also a PostgreSQL db and we duplicate each
 operation with data in both dbs. And the PosgreSQL table is much more
 smaller than the corresponding cassandra's column family. Does anyone have
 any suggestions on how can I analyze my problem? Or maybe I'm doing
 something wrong and there is another way to force gc on an existing column
 family.

 Thanks in advance,
 Alexander




-- 
It's always darkest just before you are eaten by a grue.


Composite row keys with SSTableSimpleUnsortedWriter for Cassandra 1.0?

2012-08-31 Thread Jeff Schmidt
Hello:

I'm using DataStax Enterprise 2.1, which is based on Cassandra 1.0.10 from what 
I can tell.  For my project, I perform a content build that generates a number 
of SSTables using SSTableSimpleUnsortedWriter. These are loaded using either 
JMX or sstableloader depending on the environment.

I want to introduce a composite row key into some of the generated SSTables.  
Also, I will be referring to these keys by using composite column names.

I can define the desired composite time and provide it to the 
SSTableSimpleUnsortedWriter constructor:

ListAbstractType? compositeList = new 
ArrayListAbstractType?();
compositeList.add(UTF8Type.instance)
compositeList.add(UTF8Type.instance)
compositeUtf8Utf8Type = CompositeType.getInstance(compositeList)

articleWriter = new SSTableSimpleUnsortedWriter(
cassandraOutputDir,
IngenuityContent,
Articles,
compositeUtf8Utf8Type,
null,
64) 

I then figured I could use compositeUtf8Utf8Type when creating composite row 
keys and column names of the kind I require.  Cassandra 1.1.x introduces the 
CompositeType.Builder class for creating actual composite values, but that's 
not available to me.  I've also  seen examples of using Hector's Composite to 
create composite values.

But, I need to create these values using the various classes within Cassandra 
1.0 itself to work with SSTableSimpleUnsortedWriter. For that, I'm not finding 
any examples on how one does that.

As far as I can tell, composite columns at least have been around since 
Cassandra 0.8.x?  Is there the support I need in Cassandra 1.0.x?

Many thanks!

Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068











Cassandra and Apache Drill

2012-08-31 Thread John Onusko
Like a lot of folks, I have a need for Big Data and fast queries on that data. 
Hive queries against Cassandra functionally meet my requirements, but the job 
oriented processing is too slow when you need to execute many queries on a 
small portion of the data. It seems like Apache Drill might be the right answer 
to this problem. I see HBase mentioned as a possible integration point with 
Drill, but no mention of Cassandra. Has anyone taken a look at Drill to see how 
it could access the data in Cassandra?

-John


Re: Cassandra and Apache Drill

2012-08-31 Thread Jake Luciani
I don't think Drill has been accepted into the incubator yet or has any
code.

If/When that happens then it's entirely possible Cassandra could be
integrated.

On Fri, Aug 31, 2012 at 4:29 PM, John Onusko jonu...@actiance.com wrote:

 Like a lot of folks, I have a need for Big Data and fast queries on that
 data. Hive queries against Cassandra functionally meet my requirements, but
 the job oriented processing is too slow when you need to execute many
 queries on a small portion of the data. It seems like Apache Drill might be
 the right answer to this problem. I see HBase mentioned as a possible
 integration point with Drill, but no mention of Cassandra. Has anyone taken
 a look at Drill to see how it could access the data in Cassandra?

 ** **

 -John




-- 
http://twitter.com/tjake


Re: adding node to cluster

2012-08-31 Thread Rob Coli
On Thu, Aug 30, 2012 at 10:39 PM, Casey Deccio ca...@deccio.net wrote:
 In what way are the lookups failing? Is there an exception?

 No exception--just failing in that the data should be there, but isn't.

At ConsistencyLevel.ONE or QUORUM?

If you are bootstrapping the node, I would expect there to be no
chance of serving blank reads like this. As auto_bootstrap is set to
true by default, I presume you are bootstrapping.

Which node are you querying to get the no data response?

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb