Re: Can I create a counter column family with many rows in 1.1.10?

2013-03-06 Thread Alain RODRIGUEZ
What would be the exact CQL3 syntax to create a counter CF with composite
row key and not predefined column names ?

Is the following supposed to work ?

CREATE TABLE composite_counter (
   aid   text,
   key1  text,
   key2  text,
   key3  text,
   value counter,
   PRIMARY KEY (aid, key1, key2, key3)
)

First, when I do so I have no error shown, but I *can't* see this CF appear
in my OpsCenter.

update composite_counter set value = value + 5 where aid = '1' and key1 =
'test1' and key2 = 'test2' and key3 = 'test3'; works as expected too.

But how can I have multiple counter columns using the schemaless property
of cassandra ? I mean before, when I created counter CF with cli, things
like this used to work:
update composite_counter set 'value2' = 'value2' + 5 where aid = '1' and
key1 = 'test1' and key2 = 'test2' and key3 = 'test3'; = Bad Request: line
1:29 no viable alternative at input 'value2'

I also tried:
update composite_counter set value2 = value2 + 5 where aid = '1' and key1
= 'test1' and key2 = 'test2' and key3 = 'test3';   = Bad Request: Unknown
identifier value2 (as expected I guess)

I want to make a counter CF with composite keys and a lot of counters using
this pattern 20130306#event or (20130306, event), not sure if I should
use composite columns there.

Is it mandatory to create the CF with at least one column with the
counter type ? I mean I will probably never use a column named 'value', I
defined it just to be sure the CF is defined as a counter CF.




2013/3/6 Abhijit Chanda abhijit.chan...@gmail.com

 Thanks @aaron  for the rectification


 On Wed, Mar 6, 2013 at 1:17 PM, aaron morton aa...@thelastpickle.comwrote:

 Note that CQL 3 in 1.1 is  compatible with CQL 3 in 1.2. Also you do not
 have to use CQL 3, you can still use the cassandra-cli to create CF's.

 The syntax you use to populate it depends on the client you are using.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 5/03/2013, at 9:16 PM, Abhijit Chanda abhijit.chan...@gmail.com
 wrote:

 Yes you can , you just have to use CQL3 and 1.1.10 onward cassandra
 supports CQL3.  Just you have to aware of the fact that a column family
 that contains a counter column can only contain counters. In other other
 words either all the columns of the column family excluding KEY have the
 counter type or none of them can have it.

 Best Regards,
 --
 Abhijit Chanda
 +91-974395





 --
 Abhijit Chanda
 +91-974395



RE: Can I create a counter column family with many rows in 1.1.10?

2013-03-06 Thread Mateus Ferreira e Freitas

Ah, I'ts with many columns, not rows. I use this in cql 2-3 create table cnt 
(key text PRIMARY KEY, y2003 counter, y2004 counter);it says this is not a 
counter column family, and if I try to use 
default_validation_class=CounterType,it says this is not a valid keyword.What 
I'm supposed to type in order to create it?

From: aa...@thelastpickle.com
Subject: Re: Can I create a counter column family with many rows in 1.1.10?
Date: Tue, 5 Mar 2013 23:47:38 -0800
To: user@cassandra.apache.org

Note that CQL 3 in 1.1 is  compatible with CQL 3 in 1.2. Also you do not have 
to use CQL 3, you can still use the cassandra-cli to create CF's. 
The syntax you use to populate it depends on the client you are using. 
Cheers 

-Aaron MortonFreelance Cassandra DeveloperNew Zealand
@aaronmortonhttp://www.thelastpickle.com



On 5/03/2013, at 9:16 PM, Abhijit Chanda abhijit.chan...@gmail.com wrote:Yes 
you can , you just have to use CQL3 and 1.1.10 onward cassandra supports CQL3.  
Just you have to aware of the fact that a column family that contains a counter 
column can only contain counters. In other other words either all the columns 
of the column family excluding KEY have the counter type or none of them can 
have it.

Best Regards,
-- 
Abhijit Chanda
+91-974395



  

RE: Can I create a counter column family with many rows in 1.1.10?

2013-03-06 Thread Mateus Ferreira e Freitas

I got it now.

From: mateus.ffrei...@hotmail.com
To: user@cassandra.apache.org
Subject: RE: Can I create a counter column family with many rows in 1.1.10?
Date: Wed, 6 Mar 2013 08:42:37 -0300





Ah, I'ts with many columns, not rows. I use this in cql 2-3 create table cnt 
(key text PRIMARY KEY, y2003 counter, y2004 counter);it says this is not a 
counter column family, and if I try to use 
default_validation_class=CounterType,it says this is not a valid keyword.What 
I'm supposed to type in order to create it?

From: aa...@thelastpickle.com
Subject: Re: Can I create a counter column family with many rows in 1.1.10?
Date: Tue, 5 Mar 2013 23:47:38 -0800
To: user@cassandra.apache.org

Note that CQL 3 in 1.1 is  compatible with CQL 3 in 1.2. Also you do not have 
to use CQL 3, you can still use the cassandra-cli to create CF's. 
The syntax you use to populate it depends on the client you are using. 
Cheers 

-Aaron MortonFreelance Cassandra DeveloperNew Zealand
@aaronmortonhttp://www.thelastpickle.com



On 5/03/2013, at 9:16 PM, Abhijit Chanda abhijit.chan...@gmail.com wrote:Yes 
you can , you just have to use CQL3 and 1.1.10 onward cassandra supports CQL3.  
Just you have to aware of the fact that a column family that contains a counter 
column can only contain counters. In other other words either all the columns 
of the column family excluding KEY have the counter type or none of them can 
have it.

Best Regards,
-- 
Abhijit Chanda
+91-974395




  

Re: anyone see this user-cassandra thread get answered...

2013-03-06 Thread Alain RODRIGUEZ
Wow, that's quite new... Threadjack to ask how to unsubscribe, amazing.

Help yourself: https://www.google.com/search?q=unsubscribe+cassandra

Any of the first results should help you. Goodbye !


2013/3/6 deepansh jain deepanshcri...@gmail.com

 how to unsubscribe from mailing list


 On Wed, Mar 6, 2013 at 1:06 PM, aaron morton aa...@thelastpickle.comwrote:

 bah, think I got confused by looking at the version in the email you
 linked to.

 if the update CF call is not working, and this is QA, run it  with DEBUG
 logging and file a bug here
 https://issues.apache.org/jira/browse/CASSANDRA

 Thanks

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 5/03/2013, at 8:29 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 That ticket says it was fixed in 1.1.5 and we are on 1.2.2.  We upgraded
 from 1.1.4 to 1.2.2, ran upgrade tables and watched filenames change from
 *-he-*.db to *-id-*.db, then changed compaction strategies and still had
 this issue.  Is it the fact we came from 1.1.4?  Ours was a very simple 4
 node QA test where we setup a 1.1.4 cluster, put data in, upgraded, then
 upgraded tables, then switched to LCS and run upgrade tables again hoping
 it would use LCS.

 Thanks,
 Dean

 From: aaron morton aa...@thelastpickle.com
 mailto:aa...@thelastpickle.com aa...@thelastpickle.com
 Reply-To: 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orguser@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.orguser@cassandra.apache.org
 
 Date: Tuesday, March 5, 2013 9:13 AM
 To: 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orguser@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.orguser@cassandra.apache.org
 
 Subject: Re: anyone see this user-cassandra thread get answered...

 Was probably this https://issues.apache.org/jira/browse/CASSANDRA-4597

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 4/03/2013, at 2:05 PM, Hiller, Dean dean.hil...@nrel.govmailto:
 dean.hil...@nrel.gov wrote:

 I was reading

 http://mail-archives.apache.org/mod_mbox/cassandra-user/201208.mbox/%3CCAGZm5drRh3VXNpHefR9UjH8H=dhad2y18s0xmam5cs4yfl5...@mail.gmail.com%3E
 As we are having the same issue in 1.2.2.  We modify to LCS and
 cassandra-cli shows us at LCS on any node we run cassandra cli on, but then
 looking at cqlsh, it is showing us at SizeTieredCompactionStrategy :(.

 Thanks,
 Dean






should I file a bug report on this or is this normal?

2013-03-06 Thread Hiller, Dean
I ran a pretty solid QA test(cleaned data from scratch) on version 1.2.2

My test was as so

 1.  Start up 4 node cassandra cluster
 2.  Populate with initial test data (no other data is added to system after 
this point!!!)
 3.  Run nodetool drain on every node(move stuff from commit log to sstables)
 4.  Stop and start cassandra cluster to have it running again
 5.  Get size of nreldata CF folder is 128kB
 6.  Go to node 3, run snapshot and mv snapshots directory OUT of nreldata
 7.  Get size of nreldata CF folder is 128kB
 8.  On node 3, run nodetool drain
 9.  Get size of nreldataCF folder is still 128kB
 10. Stop cassandra node
 11. Rm keyspace/nreldata/*.db
 12. Size of nreldata CF is 8kb(odd of an empty folder but ok)
 13. Start cassandra
 14. Nodetool repair databus5 nreldata
 15. Size of nreldata is now 220K ….it has exploded in size!!

I ran this QA test as we see data size explosion in production as well(I can't 
be 100% sure if this is the same thing though as above is such a small data 
set).  Would leveled compaction be a bit more stable in terms of size ratios 
and such.

QUESTIONS

 1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large 
to small) bytes of data while in the initial data it was 2192 bytes for 
43038(small to large) bytes of data?
 2.  Why is there 3 levels?  With such a small set of data, I would think it 
would flush one data file like the original data but instead there is 3 files.

My files after repair have levels 5, 6, and 7.  My files before deletion of the 
CF have just level 1.  After repair files are
-rw-rw-r--.  1 cassandra cassandra54 Mar  6 07:18 
databus5-nreldata-ib-5-CompressionInfo.db
-rw-rw-r--.  1 cassandra cassandra 29118 Mar  6 07:18 
databus5-nreldata-ib-5-Data.db
-rw-rw-r--.  1 cassandra cassandra  3856 Mar  6 07:18 
databus5-nreldata-ib-5-Filter.db
-rw-rw-r--.  1 cassandra cassandra 37000 Mar  6 07:18 
databus5-nreldata-ib-5-Index.db
-rw-rw-r--.  1 cassandra cassandra  4772 Mar  6 07:18 
databus5-nreldata-ib-5-Statistics.db
-rw-rw-r--.  1 cassandra cassandra   383 Mar  6 07:18 
databus5-nreldata-ib-5-Summary.db
-rw-rw-r--.  1 cassandra cassandra79 Mar  6 07:18 
databus5-nreldata-ib-5-TOC.txt
-rw-rw-r--.  1 cassandra cassandra46 Mar  6 07:18 
databus5-nreldata-ib-6-CompressionInfo.db
-rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 
databus5-nreldata-ib-6-Data.db
-rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 
databus5-nreldata-ib-6-Filter.db
-rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 
databus5-nreldata-ib-6-Index.db
-rw-rw-r--.  1 cassandra cassandra  4756 Mar  6 07:18 
databus5-nreldata-ib-6-Statistics.db
-rw-rw-r--.  1 cassandra cassandra   230 Mar  6 07:18 
databus5-nreldata-ib-6-Summary.db
-rw-rw-r--.  1 cassandra cassandra79 Mar  6 07:18 
databus5-nreldata-ib-6-TOC.txt
-rw-rw-r--.  1 cassandra cassandra46 Mar  6 07:18 
databus5-nreldata-ib-7-CompressionInfo.db
-rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 
databus5-nreldata-ib-7-Data.db
-rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 
databus5-nreldata-ib-7-Filter.db
-rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 
databus5-nreldata-ib-7-Index.db
-rw-rw-r--.  1 cassandra cassandra  4756 Mar  6 07:18 
databus5-nreldata-ib-7-Statistics.db
-rw-rw-r--.  1 cassandra cassandra   230 Mar  6 07:18 
databus5-nreldata-ib-7-Summary.db
-rw-rw-r--.  1 cassandra cassandra79 Mar  6 07:18 
databus5-nreldata-ib-7-TOC.txt

Before repair files(from my moved snapshot as I moved it out of the directory 
so cassandra no longer had it)….
-rw-rw-r--. 1 cassandra cassandra62 Mar  6 07:11 
databus5-nreldata-ib-1-CompressionInfo.db
-rw-rw-r--. 1 cassandra cassandra 43038 Mar  6 07:11 
databus5-nreldata-ib-1-Data.db
-rw-rw-r--. 1 cassandra cassandra  2192 Mar  6 07:11 
databus5-nreldata-ib-1-Filter.db
-rw-rw-r--. 1 cassandra cassandra 55248 Mar  6 07:11 
databus5-nreldata-ib-1-Index.db
-rw-rw-r--. 1 cassandra cassandra  4756 Mar  6 07:11 
databus5-nreldata-ib-1-Statistics.db
-rw-rw-r--. 1 cassandra cassandra   499 Mar  6 07:11 
databus5-nreldata-ib-1-Summary.db
-rw-rw-r--. 1 cassandra cassandra79 Mar  6 07:11 
databus5-nreldata-ib-1-TOC.txt

Thanks,
Dean



Re: Cassandra instead of memcached

2013-03-06 Thread Edward Capriolo
http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache

Read at ONE.
READ_REPAIR_CHANCE as low as possible.

Use short TTL and short GC_GRACE.

Make the in memory memtable size as high as possible to avoid flushing and
compacting.

Optionally turn off commit log.

You can use cassandra like memcache but it is not a memcache replacement.
Cassandra persists writes and compacts SSTables, memcache only has to keep
data in memory.

If you want to try a crazy idea. try putting your persistent data on a ram
disk! Not data/system however!






On Wed, Mar 6, 2013 at 2:45 AM, aaron morton aa...@thelastpickle.comwrote:

 consider disabling durable_writes in the KS config to remove writing to
 the commit log. That will speed things up for you. Note that you risk
 losing data is cassandra crashes or is not shut down with nodetool drain.

 Even if you set the gc_grace to 0, deletes will still need to be committed
 to disk.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 5/03/2013, at 9:51 AM, Drew Kutcharian d...@venarc.com wrote:

 Thanks Ben, that article was actually the reason I started thinking about
 removing memcached.

 I wanted to see what would be the optimum config to use C* as an in-memory
 store.

 -- Drew


 On Mar 5, 2013, at 2:39 AM, Ben Bromhead b...@instaclustr.com wrote:

 Check out
 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

 Netflix used Cassandra with SSDs and were able to drop their memcache
 layer. Mind you they were not using it purely as an in memory KV store.

 Ben
 Instaclustr | www.instaclustr.com | 
 @instaclustrhttp://twitter.com/instaclustr



 On 05/03/2013, at 4:33 PM, Drew Kutcharian d...@venarc.com wrote:

 Hi Guys,

 I'm thinking about using Cassandra as an in-memory key/value store instead
 of memcached for a new project (just to get rid of a dependency if
 possible). I was thinking about setting the replication factor to 1,
 enabling off-heap row-cache and setting gc_grace_period to zero for the CF
 that will be used for the key/value store.

 Has anyone tried this? Any comments?

 Thanks,

 Drew








Re: Can I create a counter column family with many rows in 1.1.10?

2013-03-06 Thread aaron morton
If you have one column in the table that is not part of the primary key and is 
a counter, then all columns that are not part of the primary key must also be a 
counter. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 2:56 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 What would be the exact CQL3 syntax to create a counter CF with composite row 
 key and not predefined column names ?
 
 Is the following supposed to work ?
 
 CREATE TABLE composite_counter (
aid   text,
key1  text,
key2  text,
key3  text,
value counter,
PRIMARY KEY (aid, key1, key2, key3)
 )
 
 First, when I do so I have no error shown, but I *can't* see this CF appear 
 in my OpsCenter.
 
 update composite_counter set value = value + 5 where aid = '1' and key1 = 
 'test1' and key2 = 'test2' and key3 = 'test3'; works as expected too.
 
 But how can I have multiple counter columns using the schemaless property of 
 cassandra ? I mean before, when I created counter CF with cli, things like 
 this used to work:
 update composite_counter set 'value2' = 'value2' + 5 where aid = '1' and 
 key1 = 'test1' and key2 = 'test2' and key3 = 'test3'; = Bad Request: line 
 1:29 no viable alternative at input 'value2'
 
 I also tried:
 update composite_counter set value2 = value2 + 5 where aid = '1' and key1 = 
 'test1' and key2 = 'test2' and key3 = 'test3';   = Bad Request: Unknown 
 identifier value2 (as expected I guess)
 
 I want to make a counter CF with composite keys and a lot of counters using 
 this pattern 20130306#event or (20130306, event), not sure if I should 
 use composite columns there.
 
 Is it mandatory to create the CF with at least one column with the counter 
 type ? I mean I will probably never use a column named 'value', I defined it 
 just to be sure the CF is defined as a counter CF.
 
 
 
 
 2013/3/6 Abhijit Chanda abhijit.chan...@gmail.com
 Thanks @aaron  for the rectification
 
 
 On Wed, Mar 6, 2013 at 1:17 PM, aaron morton aa...@thelastpickle.com wrote:
 Note that CQL 3 in 1.1 is  compatible with CQL 3 in 1.2. Also you do not have 
 to use CQL 3, you can still use the cassandra-cli to create CF's. 
 
 The syntax you use to populate it depends on the client you are using. 
 
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 5/03/2013, at 9:16 PM, Abhijit Chanda abhijit.chan...@gmail.com wrote:
 
 Yes you can , you just have to use CQL3 and 1.1.10 onward cassandra supports 
 CQL3.  Just you have to aware of the fact that a column family that contains 
 a counter column can only contain counters. In other other words either all 
 the columns of the column family excluding KEY have the counter type or none 
 of them can have it.
 
 Best Regards,
 -- 
 Abhijit Chanda
 +91-974395
 
 
 
 
 -- 
 Abhijit Chanda
 +91-974395
 



Re: should I file a bug report on this or is this normal?

2013-03-06 Thread aaron morton
 15. Size of nreldata is now 220K ….it has exploded in size!!
This may be explained by fragmentation in the sstables, which compaction would 
eventually resolve.

During repair the data came from multiple nodes and created multiple sstables 
for each CF. Streaming copies part of an SSTable on the source and creates an 
SSTable on the destination. This pattern is different to all writes for a CF 
going to the same sstable when flushed. 

To compare apples to apples run a major compaction after the initial data load, 
and after the repair. 
 
 1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large 
 to small) bytes of data while in the initial data it was 2192 bytes for 
 43038(small to large) bytes of data?
The size of the BF depends on the number of rows and the false positive rate. 
Not the size of the -Data.db component on disk. 
 
 2.  Why is there 3 levels?  With such a small set of data, I would think it 
 would flush one data file like the original data but instead there is 3 files.
See above. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 6:40 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 I ran a pretty solid QA test(cleaned data from scratch) on version 1.2.2
 
 My test was as so
 
 1.  Start up 4 node cassandra cluster
 2.  Populate with initial test data (no other data is added to system after 
 this point!!!)
 3.  Run nodetool drain on every node(move stuff from commit log to sstables)
 4.  Stop and start cassandra cluster to have it running again
 5.  Get size of nreldata CF folder is 128kB
 6.  Go to node 3, run snapshot and mv snapshots directory OUT of nreldata
 7.  Get size of nreldata CF folder is 128kB
 8.  On node 3, run nodetool drain
 9.  Get size of nreldataCF folder is still 128kB
 10. Stop cassandra node
 11. Rm keyspace/nreldata/*.db
 12. Size of nreldata CF is 8kb(odd of an empty folder but ok)
 13. Start cassandra
 14. Nodetool repair databus5 nreldata
 15. Size of nreldata is now 220K ….it has exploded in size!!
 
 I ran this QA test as we see data size explosion in production as well(I 
 can't be 100% sure if this is the same thing though as above is such a small 
 data set).  Would leveled compaction be a bit more stable in terms of size 
 ratios and such.
 
 QUESTIONS
 
 1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large 
 to small) bytes of data while in the initial data it was 2192 bytes for 
 43038(small to large) bytes of data?
 2.  Why is there 3 levels?  With such a small set of data, I would think it 
 would flush one data file like the original data but instead there is 3 files.
 
 My files after repair have levels 5, 6, and 7.  My files before deletion of 
 the CF have just level 1.  After repair files are
 -rw-rw-r--.  1 cassandra cassandra54 Mar  6 07:18 
 databus5-nreldata-ib-5-CompressionInfo.db
 -rw-rw-r--.  1 cassandra cassandra 29118 Mar  6 07:18 
 databus5-nreldata-ib-5-Data.db
 -rw-rw-r--.  1 cassandra cassandra  3856 Mar  6 07:18 
 databus5-nreldata-ib-5-Filter.db
 -rw-rw-r--.  1 cassandra cassandra 37000 Mar  6 07:18 
 databus5-nreldata-ib-5-Index.db
 -rw-rw-r--.  1 cassandra cassandra  4772 Mar  6 07:18 
 databus5-nreldata-ib-5-Statistics.db
 -rw-rw-r--.  1 cassandra cassandra   383 Mar  6 07:18 
 databus5-nreldata-ib-5-Summary.db
 -rw-rw-r--.  1 cassandra cassandra79 Mar  6 07:18 
 databus5-nreldata-ib-5-TOC.txt
 -rw-rw-r--.  1 cassandra cassandra46 Mar  6 07:18 
 databus5-nreldata-ib-6-CompressionInfo.db
 -rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 
 databus5-nreldata-ib-6-Data.db
 -rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 
 databus5-nreldata-ib-6-Filter.db
 -rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 
 databus5-nreldata-ib-6-Index.db
 -rw-rw-r--.  1 cassandra cassandra  4756 Mar  6 07:18 
 databus5-nreldata-ib-6-Statistics.db
 -rw-rw-r--.  1 cassandra cassandra   230 Mar  6 07:18 
 databus5-nreldata-ib-6-Summary.db
 -rw-rw-r--.  1 cassandra cassandra79 Mar  6 07:18 
 databus5-nreldata-ib-6-TOC.txt
 -rw-rw-r--.  1 cassandra cassandra46 Mar  6 07:18 
 databus5-nreldata-ib-7-CompressionInfo.db
 -rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 
 databus5-nreldata-ib-7-Data.db
 -rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 
 databus5-nreldata-ib-7-Filter.db
 -rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 
 databus5-nreldata-ib-7-Index.db
 -rw-rw-r--.  1 cassandra cassandra  4756 Mar  6 07:18 
 databus5-nreldata-ib-7-Statistics.db
 -rw-rw-r--.  1 cassandra cassandra   230 Mar  6 07:18 
 databus5-nreldata-ib-7-Summary.db
 -rw-rw-r--.  1 cassandra cassandra79 Mar  6 07:18 
 databus5-nreldata-ib-7-TOC.txt
 
 Before repair files(from my moved snapshot as I moved it out of the directory 
 so cassandra no longer had it)….
 -rw-rw-r--. 1 cassandra cassandra62 Mar  6 07:11 
 databus5-nreldata-ib-1-CompressionInfo.db
 -rw-rw-r--. 1 

Re: should I file a bug report on this or is this normal?

2013-03-06 Thread Hiller, Dean
Thanks for the great info, I will give it a go.

1 question though, my false positive rate and number of rows is not changing so 
why is the bloomfilter bigger?  Or do you mean bloomfilter is not based on 
number of rows int he table but based on how the rows are spread through the 
sstable files?

Ie. I have the same amount of rows before and after in that specific column 
family.


Thanks,
Dean

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, March 6, 2013 9:29 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: should I file a bug report on this or is this normal?

15. Size of nreldata is now 220K ….it has exploded in size!!
This may be explained by fragmentation in the sstables, which compaction would 
eventually resolve.

During repair the data came from multiple nodes and created multiple sstables 
for each CF. Streaming copies part of an SSTable on the source and creates an 
SSTable on the destination. This pattern is different to all writes for a CF 
going to the same sstable when flushed.

To compare apples to apples run a major compaction after the initial data load, 
and after the repair.

1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large to 
small) bytes of data while in the initial data it was 2192 bytes for 
43038(small to large) bytes of data?
The size of the BF depends on the number of rows and the false positive rate. 
Not the size of the -Data.db component on disk.

2.  Why is there 3 levels?  With such a small set of data, I would think it 
would flush one data file like the original data but instead there is 3 files.
See above.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 6:40 AM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:

I ran a pretty solid QA test(cleaned data from scratch) on version 1.2.2

My test was as so

1.  Start up 4 node cassandra cluster
2.  Populate with initial test data (no other data is added to system after 
this point!!!)
3.  Run nodetool drain on every node(move stuff from commit log to sstables)
4.  Stop and start cassandra cluster to have it running again
5.  Get size of nreldata CF folder is 128kB
6.  Go to node 3, run snapshot and mv snapshots directory OUT of nreldata
7.  Get size of nreldata CF folder is 128kB
8.  On node 3, run nodetool drain
9.  Get size of nreldataCF folder is still 128kB
10. Stop cassandra node
11. Rm keyspace/nreldata/*.db
12. Size of nreldata CF is 8kb(odd of an empty folder but ok)
13. Start cassandra
14. Nodetool repair databus5 nreldata
15. Size of nreldata is now 220K ….it has exploded in size!!

I ran this QA test as we see data size explosion in production as well(I can't 
be 100% sure if this is the same thing though as above is such a small data 
set).  Would leveled compaction be a bit more stable in terms of size ratios 
and such.

QUESTIONS

1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large to 
small) bytes of data while in the initial data it was 2192 bytes for 
43038(small to large) bytes of data?
2.  Why is there 3 levels?  With such a small set of data, I would think it 
would flush one data file like the original data but instead there is 3 files.

My files after repair have levels 5, 6, and 7.  My files before deletion of the 
CF have just level 1.  After repair files are
-rw-rw-r--.  1 cassandra cassandra54 Mar  6 07:18 
databus5-nreldata-ib-5-CompressionInfo.db
-rw-rw-r--.  1 cassandra cassandra 29118 Mar  6 07:18 
databus5-nreldata-ib-5-Data.db
-rw-rw-r--.  1 cassandra cassandra  3856 Mar  6 07:18 
databus5-nreldata-ib-5-Filter.db
-rw-rw-r--.  1 cassandra cassandra 37000 Mar  6 07:18 
databus5-nreldata-ib-5-Index.db
-rw-rw-r--.  1 cassandra cassandra  4772 Mar  6 07:18 
databus5-nreldata-ib-5-Statistics.db
-rw-rw-r--.  1 cassandra cassandra   383 Mar  6 07:18 
databus5-nreldata-ib-5-Summary.db
-rw-rw-r--.  1 cassandra cassandra79 Mar  6 07:18 
databus5-nreldata-ib-5-TOC.txt
-rw-rw-r--.  1 cassandra cassandra46 Mar  6 07:18 
databus5-nreldata-ib-6-CompressionInfo.db
-rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 
databus5-nreldata-ib-6-Data.db
-rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 
databus5-nreldata-ib-6-Filter.db
-rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 
databus5-nreldata-ib-6-Index.db
-rw-rw-r--.  1 cassandra cassandra  4756 Mar  6 07:18 
databus5-nreldata-ib-6-Statistics.db
-rw-rw-r--.  1 cassandra cassandra   230 Mar  6 07:18 
databus5-nreldata-ib-6-Summary.db
-rw-rw-r--.  1 cassandra cassandra79 Mar  6 07:18 
databus5-nreldata-ib-6-TOC.txt
-rw-rw-r--.  1 cassandra cassandra46 Mar  6 07:18 

Re: Cassandra instead of memcached

2013-03-06 Thread Drew Kutcharian
Thanks guys, this is what I was looking for.

@Edward. I definitely like crazy ideas ;), I think the only issue here is that 
C* is a disk space hug, so not sure if that would be feasible since free RAM is 
not as abundant as disk. BTW, I watched your presentation, are you guys still 
using C* as in-memory store?




On Mar 6, 2013, at 7:44 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache
 
 Read at ONE.
 READ_REPAIR_CHANCE as low as possible.
 
 Use short TTL and short GC_GRACE.
 
 Make the in memory memtable size as high as possible to avoid flushing and 
 compacting.
 
 Optionally turn off commit log.
 
 You can use cassandra like memcache but it is not a memcache replacement. 
 Cassandra persists writes and compacts SSTables, memcache only has to keep 
 data in memory.
 
 If you want to try a crazy idea. try putting your persistent data on a ram 
 disk! Not data/system however!
 
 
 
 
 
 
 On Wed, Mar 6, 2013 at 2:45 AM, aaron morton aa...@thelastpickle.com wrote:
 consider disabling durable_writes in the KS config to remove writing to the 
 commit log. That will speed things up for you. Note that you risk losing data 
 is cassandra crashes or is not shut down with nodetool drain. 
 
 Even if you set the gc_grace to 0, deletes will still need to be committed to 
 disk. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 5/03/2013, at 9:51 AM, Drew Kutcharian d...@venarc.com wrote:
 
 Thanks Ben, that article was actually the reason I started thinking about 
 removing memcached.
 
 I wanted to see what would be the optimum config to use C* as an in-memory 
 store.
 
 -- Drew
 
 
 On Mar 5, 2013, at 2:39 AM, Ben Bromhead b...@instaclustr.com wrote:
 
 Check out 
 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
 
 Netflix used Cassandra with SSDs and were able to drop their memcache 
 layer. Mind you they were not using it purely as an in memory KV store.
 
 Ben
 Instaclustr | www.instaclustr.com | @instaclustr
 
 
 
 On 05/03/2013, at 4:33 PM, Drew Kutcharian d...@venarc.com wrote:
 
 Hi Guys,
 
 I'm thinking about using Cassandra as an in-memory key/value store instead 
 of memcached for a new project (just to get rid of a dependency if 
 possible). I was thinking about setting the replication factor to 1, 
 enabling off-heap row-cache and setting gc_grace_period to zero for the CF 
 that will be used for the key/value store.
 
 Has anyone tried this? Any comments?
 
 Thanks,
 
 Drew
 
 
 
 
 
 



Re: Cassandra instead of memcached

2013-03-06 Thread Edward Capriolo
If your writing much more data then RAM cassandra will not work as fast as
memcache. Cassandra is not magical, if all of your data fits in memory it
is going to be fast, if most of your data fits in memory it can still be
fast. However if you plan on having much more data then disk you need to
think about more RAM and OR SSD disks.

We do not use c* as an in-memory store. However for many of our datasets
we do not have a separate caching tier. In those cases cassandra is both
our database and our in-memory store if you want to use those terms :)

On Wed, Mar 6, 2013 at 12:02 PM, Drew Kutcharian d...@venarc.com wrote:

 Thanks guys, this is what I was looking for.

 @Edward. I definitely like crazy ideas ;), I think the only issue here is
 that C* is a disk space hug, so not sure if that would be feasible since
 free RAM is not as abundant as disk. BTW, I watched your presentation, are
 you guys still using C* as in-memory store?




 On Mar 6, 2013, at 7:44 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache

 Read at ONE.
 READ_REPAIR_CHANCE as low as possible.

 Use short TTL and short GC_GRACE.

 Make the in memory memtable size as high as possible to avoid flushing and
 compacting.

 Optionally turn off commit log.

 You can use cassandra like memcache but it is not a memcache replacement.
 Cassandra persists writes and compacts SSTables, memcache only has to keep
 data in memory.

 If you want to try a crazy idea. try putting your persistent data on a ram
 disk! Not data/system however!






 On Wed, Mar 6, 2013 at 2:45 AM, aaron morton aa...@thelastpickle.comwrote:

 consider disabling durable_writes in the KS config to remove writing to
 the commit log. That will speed things up for you. Note that you risk
 losing data is cassandra crashes or is not shut down with nodetool drain.

 Even if you set the gc_grace to 0, deletes will still need to be
 committed to disk.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 5/03/2013, at 9:51 AM, Drew Kutcharian d...@venarc.com wrote:

 Thanks Ben, that article was actually the reason I started thinking about
 removing memcached.

 I wanted to see what would be the optimum config to use C* as an
 in-memory store.

 -- Drew


 On Mar 5, 2013, at 2:39 AM, Ben Bromhead b...@instaclustr.com wrote:

 Check out
 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

 Netflix used Cassandra with SSDs and were able to drop their memcache
 layer. Mind you they were not using it purely as an in memory KV store.

 Ben
 Instaclustr | www.instaclustr.com | 
 @instaclustrhttp://twitter.com/instaclustr



 On 05/03/2013, at 4:33 PM, Drew Kutcharian d...@venarc.com wrote:

 Hi Guys,

 I'm thinking about using Cassandra as an in-memory key/value store
 instead of memcached for a new project (just to get rid of a dependency if
 possible). I was thinking about setting the replication factor to 1,
 enabling off-heap row-cache and setting gc_grace_period to zero for the CF
 that will be used for the key/value store.

 Has anyone tried this? Any comments?

 Thanks,

 Drew










Re: Cassandra instead of memcached

2013-03-06 Thread Drew Kutcharian
I think the dataset should fit in memory easily. The main purpose of this would 
be as a store for an API rate limiting/accounting system. I think ebay guys are 
using C* too for the same reason. Initially we were thinking of using Hazelcast 
or memcahed. But Hazelcast (at least the community edition) has Java gc issues 
with big heaps and the problem with memcached is lack of a reliable 
distribution (you lose a node, you need to rehash everything), so I figured why 
not just use C*.
 


On Mar 6, 2013, at 9:08 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 If your writing much more data then RAM cassandra will not work as fast as 
 memcache. Cassandra is not magical, if all of your data fits in memory it is 
 going to be fast, if most of your data fits in memory it can still be fast. 
 However if you plan on having much more data then disk you need to think 
 about more RAM and OR SSD disks.
 
 We do not use c* as an in-memory store. However for many of our datasets we 
 do not have a separate caching tier. In those cases cassandra is both our 
 database and our in-memory store if you want to use those terms :)
 
 On Wed, Mar 6, 2013 at 12:02 PM, Drew Kutcharian d...@venarc.com wrote:
 Thanks guys, this is what I was looking for.
 
 @Edward. I definitely like crazy ideas ;), I think the only issue here is 
 that C* is a disk space hug, so not sure if that would be feasible since free 
 RAM is not as abundant as disk. BTW, I watched your presentation, are you 
 guys still using C* as in-memory store?
 
 
 
 
 On Mar 6, 2013, at 7:44 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 
 http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache
 
 Read at ONE.
 READ_REPAIR_CHANCE as low as possible.
 
 Use short TTL and short GC_GRACE.
 
 Make the in memory memtable size as high as possible to avoid flushing and 
 compacting.
 
 Optionally turn off commit log.
 
 You can use cassandra like memcache but it is not a memcache replacement. 
 Cassandra persists writes and compacts SSTables, memcache only has to keep 
 data in memory.
 
 If you want to try a crazy idea. try putting your persistent data on a ram 
 disk! Not data/system however!
 
 
 
 
 
 
 On Wed, Mar 6, 2013 at 2:45 AM, aaron morton aa...@thelastpickle.com wrote:
 consider disabling durable_writes in the KS config to remove writing to the 
 commit log. That will speed things up for you. Note that you risk losing 
 data is cassandra crashes or is not shut down with nodetool drain. 
 
 Even if you set the gc_grace to 0, deletes will still need to be committed 
 to disk. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 5/03/2013, at 9:51 AM, Drew Kutcharian d...@venarc.com wrote:
 
 Thanks Ben, that article was actually the reason I started thinking about 
 removing memcached.
 
 I wanted to see what would be the optimum config to use C* as an in-memory 
 store.
 
 -- Drew
 
 
 On Mar 5, 2013, at 2:39 AM, Ben Bromhead b...@instaclustr.com wrote:
 
 Check out 
 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
 
 Netflix used Cassandra with SSDs and were able to drop their memcache 
 layer. Mind you they were not using it purely as an in memory KV store.
 
 Ben
 Instaclustr | www.instaclustr.com | @instaclustr
 
 
 
 On 05/03/2013, at 4:33 PM, Drew Kutcharian d...@venarc.com wrote:
 
 Hi Guys,
 
 I'm thinking about using Cassandra as an in-memory key/value store 
 instead of memcached for a new project (just to get rid of a dependency 
 if possible). I was thinking about setting the replication factor to 1, 
 enabling off-heap row-cache and setting gc_grace_period to zero for the 
 CF that will be used for the key/value store.
 
 Has anyone tried this? Any comments?
 
 Thanks,
 
 Drew
 
 
 
 
 
 
 
 



Re: Cassandra instead of memcached

2013-03-06 Thread Wei Zhu
It also depends on you SLA, it should work for 99% of the time. But one 
GC/flush/compact could screw things up big time if you have tight SLA.

-Wei



 From: Drew Kutcharian d...@venarc.com
To: user@cassandra.apache.org 
Sent: Wednesday, March 6, 2013 9:32 AM
Subject: Re: Cassandra instead of memcached
 

I think the dataset should fit in memory easily. The main purpose of this would 
be as a store for an API rate limiting/accounting system. I think ebay guys are 
using C* too for the same reason. Initially we were thinking of using Hazelcast 
or memcahed. But Hazelcast (at least the community edition) has Java gc issues 
with big heaps and the problem with memcached is lack of a reliable 
distribution (you lose a node, you need to rehash everything), so I figured why 
not just use C*.
 




On Mar 6, 2013, at 9:08 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

If your writing much more data then RAM cassandra will not work as fast as 
memcache. Cassandra is not magical, if all of your data fits in memory it is 
going to be fast, if most of your data fits in memory it can still be fast. 
However if you plan on having much more data then disk you need to think about 
more RAM and OR SSD disks.



We do not use c* as an in-memory store. However for many of our datasets we 
do not have a separate caching tier. In those cases cassandra is both our 
database and our in-memory store if you want to use those terms :)

On Wed, Mar 6, 2013 at 12:02 PM, Drew Kutcharian d...@venarc.com wrote:

Thanks guys, this is what I was looking for.


@Edward. I definitely like crazy ideas ;), I think the only issue here is 
that C* is a disk space hug, so not sure if that would be feasible since free 
RAM is not as abundant as disk. BTW, I watched your presentation, are you 
guys still using C* as in-memory store?








On Mar 6, 2013, at 7:44 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache



Read at ONE.
READ_REPAIR_CHANCE as low as possible.


Use short TTL and short GC_GRACE.


Make the in memory memtable size as high as possible to avoid flushing and 
compacting.


Optionally turn off commit log.


You can use cassandra like memcache but it is not a memcache replacement. 
Cassandra persists writes and compacts SSTables, memcache only has to keep 
data in memory.


If you want to try a crazy idea. try putting your persistent data on a ram 
disk! Not data/system however!











On Wed, Mar 6, 2013 at 2:45 AM, aaron morton aa...@thelastpickle.com wrote:

consider disabling durable_writes in the KS config to remove writing to the 
commit log. That will speed things up for you. Note that you risk losing 
data is cassandra crashes or is not shut down with nodetool drain. 


Even if you set the gc_grace to 0, deletes will still need to be committed 
to disk. 


Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand


@aaronmorton
http://www.thelastpickle.com/

On 5/03/2013, at 9:51 AM, Drew Kutcharian d...@venarc.com wrote:

Thanks Ben, that article was actually the reason I started thinking about 
removing memcached.


I wanted to see what would be the optimum config to use C* as an in-memory 
store.


-- Drew





On Mar 5, 2013, at 2:39 AM, Ben Bromhead b...@instaclustr.com wrote:

Check out 
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html


Netflix used Cassandra with SSDs and were able to drop their memcache 
layer. Mind you they were not using it purely as an in memory KV store.


Ben
Instaclustr | www.instaclustr.com | @instaclustr




On 05/03/2013, at 4:33 PM, Drew Kutcharian d...@venarc.com wrote:

Hi Guys,

I'm thinking about using Cassandra as an in-memory key/value store 
instead of memcached for a new project (just to get rid of a dependency 
if possible). I was thinking about setting the replication factor to 1, 
enabling off-heap row-cache and setting gc_grace_period to zero for the 
CF that will be used for the key/value store.

Has anyone tried this? Any comments?

Thanks,

Drew










Hinted handoff

2013-03-06 Thread Kanwar Sangha
Hi - Is there a way to increase the hinted handoff throughput ? I am seeing 
around 8Mb/s (bits).

Thanks,
Kanwar



RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
Got the param. thanks

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 13:50
To: user@cassandra.apache.org
Subject: Hinted handoff

Hi - Is there a way to increase the hinted handoff throughput ? I am seeing 
around 8Mb/s (bits).

Thanks,
Kanwar



RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
After trying to bump up the hinted_handoff_throttle_in_kb to 1G/b per sec, It 
still does not go above 25Mb/s.  Is there a limitation ?



From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 14:41
To: user@cassandra.apache.org
Subject: RE: Hinted handoff

Got the param. thanks

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 13:50
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Hinted handoff

Hi - Is there a way to increase the hinted handoff throughput ? I am seeing 
around 8Mb/s (bits).

Thanks,
Kanwar



RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
Is this correct ?

I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM with IOPS of 
80 per disk. Data is ~9.5 TB

So 4K * 80 * 9.5 = 3040 KB ~  23.75 Mb/s.

So basically I am limited at the disk rather than the n/w

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 15:11
To: user@cassandra.apache.org
Subject: RE: Hinted handoff

After trying to bump up the hinted_handoff_throttle_in_kb to 1G/b per sec, It 
still does not go above 25Mb/s.  Is there a limitation ?



From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 14:41
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Hinted handoff

Got the param. thanks

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 13:50
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Hinted handoff

Hi - Is there a way to increase the hinted handoff throughput ? I am seeing 
around 8Mb/s (bits).

Thanks,
Kanwar



Re: Consistent problem when solve Digest mismatch

2013-03-06 Thread Jason Tang
Actually I didn't concurrent update the same records, because I first
create it, then search it, then delete it. The version conflict solved
failed, due to delete local time stamp is earlier then create local time
stamp.


2013/3/6 aaron morton aa...@thelastpickle.com

 Otherwise, it means the version conflict solving strong depends on global
 sequence id (timestamp) which need provide by client ?

 Yes.
 If you have an  area of your data model that has a high degree of
 concurrency C* may not be the right match.

 In 1.1 we have atomic updates so clients see either the entire write or
 none of it. And sometimes you can design a data model that does mutate
 shared values, but writes ledger entries instead. See Matt Denis talk here
 http://www.datastax.com/events/cassandrasummit2012/presentations or this
 post http://thelastpickle.com/2012/08/18/Sorting-Lists-For-Humans/

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 4/03/2013, at 4:30 PM, Jason Tang ares.t...@gmail.com wrote:

 Hi

 The timestamp provided by my client is unix timestamp (with ntp), and as I
 said, due to the ntp drift, the local unix timestamp is not accurately
 synchronized (compare to my case).

 So for short, client can not provide global sequence number to indicate
 the event order.

 But I wonder, I configured Cassandra consistency level as write QUORUM. So
 for one record, I suppose Cassandra has the ability to decide the final
 update results.

 Otherwise, it means the version conflict solving strong depends on global
 sequence id (timestamp) which need provide by client ?


 //Tang


 2013/3/4 Sylvain Lebresne sylv...@datastax.com

 The problem is, what is the sequence number you are talking about is
 exactly?

 Or let me put it another way: if you do have a sequence number that
 provides a total ordering of your operation, then that is exactly what you
 should use as your timestamp. What Cassandra calls the timestamp, is
 exactly what you call seqID, it's the number Cassandra uses to decide the
 order of operation.

 Except that in real life, provided you have more than one client talking
 to Cassandra, then providing a total ordering of operation is hard, and in
 fact not doable efficiently. So in practice, people use unix timestamp
 (with ntp) which provide a very good while cheap approximation of the real
 life order of operations.

 But again, if you do know how to assign a more precise timestamp,
 Cassandra let you use that: you can provid your own timestamp (using unix
 timestamp is just the default). The point being, unix timestamp is the
 better approximation we have in practice.

 --
 Sylvain


 On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang ares.t...@gmail.com wrote:

 Hi

   Previous I met a consistency problem, you can refer the link below for
 the whole story.

 http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3CCAFb+LUxna0jiY0V=AvXKzUdxSjApYm4zWk=ka9ljm-txc04...@mail.gmail.com%3E

   And after check the code, seems I found some clue of the problem.
 Maybe some one can check this.

   For short, I have Cassandra cluster (1.0.3), The consistency level is
 read/write quorum, replication_factor is 3.

   Here is event sequence:

 seqID   NodeA   NodeB   NodeC
 1. New  New   New
 2. Update  Update   Update
 3. Delete   Delete

 When try to read from NodeB and NodeC, Digest mismatch exception
 triggered, so Cassandra try to resolve this version conflict.
 But the result is value Update.

 Here is the suspect root cause, the version conflict resolved based
 on time stamp.

 Node C local time is a bit earlier then node A.

 Update requests sent from node C with time stamp 00:00:00.050,
 Delete sent from node A with time stamp 00:00:00.020, which is not same
 as the event sequence.

 So the version conflict resolved incorrectly.

 It is true?

 If Yes, then it means, consistency level can secure the conflict been
 found, but to solve it correctly, dependence one time synchronization's
 accuracy, e.g. NTP ?








Correct way to set ByteOrderedPartitioner initial tokens

2013-03-06 Thread Mateus Ferreira e Freitas

I have 4 Nodes, and I'd like to store all keys starting with 'a' on node 1, 'b' 
on 2, and so on.My keys just start with a letter and numbers follow, like 
'a150', 'b1','c32000'.I've set the initial tokens to 61ff, 62ff ,63ff, 64ff 
.This does not seem to be the correct way.Thanks.   


Re: Hinted handoff

2013-03-06 Thread aaron morton
Check the IO utilisation using iostat

You *really* should not need to make HH run faster, if you do there is some 
thing bad going on. I would consider dropping the hints and running repair. 

 Data is ~9.5 TB
Do you have 9.5TB on a single node ? 
In the normal case it's best to have around 300 to 500GB per node. With that 
much data is will take a week to run repair or replace a failed node. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 1:22 PM, Kanwar Sangha kan...@mavenir.com wrote:

 Is this correct ?
  
 I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM with IOPS 
 of 80 per disk. Data is ~9.5 TB
  
 So 4K * 80 * 9.5 = 3040 KB ~  23.75 Mb/s.
  
 So basically I am limited at the disk rather than the n/w
  
 From: Kanwar Sangha [mailto:kan...@mavenir.com] 
 Sent: 06 March 2013 15:11
 To: user@cassandra.apache.org
 Subject: RE: Hinted handoff
  
 After trying to bump up the “hinted_handoff_throttle_in_kb” to 1G/b per sec, 
 It still does not go above 25Mb/s.  Is there a limitation ?
  
  
  
 From: Kanwar Sangha [mailto:kan...@mavenir.com] 
 Sent: 06 March 2013 14:41
 To: user@cassandra.apache.org
 Subject: RE: Hinted handoff
  
 Got the param. thanks
  
 From: Kanwar Sangha [mailto:kan...@mavenir.com] 
 Sent: 06 March 2013 13:50
 To: user@cassandra.apache.org
 Subject: Hinted handoff
  
 Hi – Is there a way to increase the hinted handoff throughput ? I am seeing 
 around 8Mb/s (bits).
  
 Thanks,
 Kanwar



Re: should I file a bug report on this or is this normal?

2013-03-06 Thread aaron morton
 but based on how the rows are spread through the sstable files?
It's per sstable. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 8:51 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 Thanks for the great info, I will give it a go.
 
 1 question though, my false positive rate and number of rows is not changing 
 so why is the bloomfilter bigger?  Or do you mean bloomfilter is not based on 
 number of rows int he table but based on how the rows are spread through the 
 sstable files?
 
 Ie. I have the same amount of rows before and after in that specific column 
 family.
 
 
 Thanks,
 Dean
 
 From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Wednesday, March 6, 2013 9:29 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: should I file a bug report on this or is this normal?
 
 15. Size of nreldata is now 220K ….it has exploded in size!!
 This may be explained by fragmentation in the sstables, which compaction 
 would eventually resolve.
 
 During repair the data came from multiple nodes and created multiple sstables 
 for each CF. Streaming copies part of an SSTable on the source and creates an 
 SSTable on the destination. This pattern is different to all writes for a CF 
 going to the same sstable when flushed.
 
 To compare apples to apples run a major compaction after the initial data 
 load, and after the repair.
 
 1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large 
 to small) bytes of data while in the initial data it was 2192 bytes for 
 43038(small to large) bytes of data?
 The size of the BF depends on the number of rows and the false positive rate. 
 Not the size of the -Data.db component on disk.
 
 2.  Why is there 3 levels?  With such a small set of data, I would think it 
 would flush one data file like the original data but instead there is 3 files.
 See above.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 6/03/2013, at 6:40 AM, Hiller, Dean 
 dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
 
 I ran a pretty solid QA test(cleaned data from scratch) on version 1.2.2
 
 My test was as so
 
 1.  Start up 4 node cassandra cluster
 2.  Populate with initial test data (no other data is added to system after 
 this point!!!)
 3.  Run nodetool drain on every node(move stuff from commit log to sstables)
 4.  Stop and start cassandra cluster to have it running again
 5.  Get size of nreldata CF folder is 128kB
 6.  Go to node 3, run snapshot and mv snapshots directory OUT of nreldata
 7.  Get size of nreldata CF folder is 128kB
 8.  On node 3, run nodetool drain
 9.  Get size of nreldataCF folder is still 128kB
 10. Stop cassandra node
 11. Rm keyspace/nreldata/*.db
 12. Size of nreldata CF is 8kb(odd of an empty folder but ok)
 13. Start cassandra
 14. Nodetool repair databus5 nreldata
 15. Size of nreldata is now 220K ….it has exploded in size!!
 
 I ran this QA test as we see data size explosion in production as well(I 
 can't be 100% sure if this is the same thing though as above is such a small 
 data set).  Would leveled compaction be a bit more stable in terms of size 
 ratios and such.
 
 QUESTIONS
 
 1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large 
 to small) bytes of data while in the initial data it was 2192 bytes for 
 43038(small to large) bytes of data?
 2.  Why is there 3 levels?  With such a small set of data, I would think it 
 would flush one data file like the original data but instead there is 3 files.
 
 My files after repair have levels 5, 6, and 7.  My files before deletion of 
 the CF have just level 1.  After repair files are
 -rw-rw-r--.  1 cassandra cassandra54 Mar  6 07:18 
 databus5-nreldata-ib-5-CompressionInfo.db
 -rw-rw-r--.  1 cassandra cassandra 29118 Mar  6 07:18 
 databus5-nreldata-ib-5-Data.db
 -rw-rw-r--.  1 cassandra cassandra  3856 Mar  6 07:18 
 databus5-nreldata-ib-5-Filter.db
 -rw-rw-r--.  1 cassandra cassandra 37000 Mar  6 07:18 
 databus5-nreldata-ib-5-Index.db
 -rw-rw-r--.  1 cassandra cassandra  4772 Mar  6 07:18 
 databus5-nreldata-ib-5-Statistics.db
 -rw-rw-r--.  1 cassandra cassandra   383 Mar  6 07:18 
 databus5-nreldata-ib-5-Summary.db
 -rw-rw-r--.  1 cassandra cassandra79 Mar  6 07:18 
 databus5-nreldata-ib-5-TOC.txt
 -rw-rw-r--.  1 cassandra cassandra46 Mar  6 07:18 
 databus5-nreldata-ib-6-CompressionInfo.db
 -rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 
 databus5-nreldata-ib-6-Data.db
 -rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 
 databus5-nreldata-ib-6-Filter.db
 -rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 

Cassandra OOM, many deletedColumn

2013-03-06 Thread 金剑
Hi,

My version is  1.1.7

Our use case is : we have a index columnfamily to record how many resource
is stored for a user. The number might vary from tens to millions.

We provide a feature to let user to delete resource according prefix.


 we found some cassandra will OOM after some period. The cluster is a kind
of cross-datacenter ring.

1. Exception in cassandra log:

ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java
(line 135) Exception in thread Thread[Thread-5810,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
shut down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at
java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758)

at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)

at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java
(line 135) Exception in thread Thread[Thread-5819,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
shut down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java
(line 135) Exception in thread Thread[Thread-36,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
shut down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java
(line 135) Exception in thread Thread[Thread-3990,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
shut down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

ERROR [ACCEPT-/10.139.50.62] AbstractCassandraDaemon.java (line 135)
Exception in thread Thread[ACCEPT-/10.139.50.62,5,main]
java.lang.RuntimeException: java.nio.channels.ClosedChannelException
at
org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:710)

Caused by: java.nio.channels.ClosedChannelException
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:137)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
at
org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:699)

 INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java
(line 374) Timed out replaying hints to /23.20.84.240; aborting further
deliveries
 INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java
(line 392) Finished hinted handoff of 0 rows to endpoint
 INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java
(line 296) Started hinted handoff for token: 3

2. From heap dump, there are many deletedColumn found, rooted from thread
readStage.


Pls help: where might be the problem?

Best Regards!

Jian Jin


Re: Correct way to set ByteOrderedPartitioner initial tokens

2013-03-06 Thread aaron morton
 I have 4 Nodes, and I'd like to store all keys starting with 'a' on node 1, 
 'b' on 2, and so on.
Can I ask why ? 

In general you *really* dont want to use the ByteOrderedPartitioner. If you are 
starting out, you will have a happier time if you start with the Random 
Partitioner. 

If you want your code to know where the rows are take a look at the Astynax 
client https://github.com/Netflix/astyanax

 I've set the initial tokens to 61ff, 62ff ,63ff, 64ff .
I think you want to set them to the letter an then the highest number you will 
ever use. (coded as hex)

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 9:31 PM, Mateus Ferreira e Freitas 
mateus.ffrei...@hotmail.com wrote:

 I have 4 Nodes, and I'd like to store all keys starting with 'a' on node 1, 
 'b' on 2, and so on.
 My keys just start with a letter and numbers follow, like 'a150', 
 'b1','c32000'.
 I've set the initial tokens to 61ff, 62ff ,63ff, 64ff .
 This does not seem to be the correct way.
 Thanks.



Re: Cassandra OOM, many deletedColumn

2013-03-06 Thread Jason Wee
hmm.. did you managed to take a look using nodetool tpstats? That may give
you indication further..

Jason


On Thu, Mar 7, 2013 at 1:56 PM, 金剑 jinjia...@gmail.com wrote:

 Hi,

 My version is  1.1.7

 Our use case is : we have a index columnfamily to record how many resource
 is stored for a user. The number might vary from tens to millions.

 We provide a feature to let user to delete resource according prefix.


  we found some cassandra will OOM after some period. The cluster is a kind
 of cross-datacenter ring.

 1. Exception in cassandra log:

 ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-5810,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758)

 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-5819,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-36,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-3990,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [ACCEPT-/10.139.50.62] AbstractCassandraDaemon.java (line 135)
 Exception in thread Thread[ACCEPT-/10.139.50.62,5,main]
 java.lang.RuntimeException: java.nio.channels.ClosedChannelException
 at
 org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:710)

 Caused by: java.nio.channels.ClosedChannelException
 at
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:137)
 at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
 at
 org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:699)

  INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java
 (line 374) Timed out replaying hints to /23.20.84.240; aborting further
 deliveries
  INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java
 (line 392) Finished hinted handoff of 0 rows to endpoint
  INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java
 (line 296) Started hinted handoff for token: 3

 2. From heap dump, there are many deletedColumn found, rooted from thread
 readStage.


 Pls help: where might be the problem?

 Best Regards!

 Jian Jin