column with TTL of 10 seconds lives very long...

2013-05-23 Thread Tamar Fraenkel
Hi!
I have Cassandra cluster with 3 node running version 1.0.11.

I am using Hector HLockManagerImpl, which creates a keyspace named
HLockManagerImpl and CF HLocks.
For some reason I have a row with single column that should have expired
yesterday who is still there.
I tried deleting it using cli, but it is stuck...
Any ideas how to delete it?

Thanks,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
tokLogo.png

Re: column with TTL of 10 seconds lives very long...

2013-05-23 Thread Nikolay Mihaylov
Did you synchronized the clocks between servers?


On Thu, May 23, 2013 at 9:32 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 I have Cassandra cluster with 3 node running version 1.0.11.

 I am using Hector HLockManagerImpl, which creates a keyspace named
 HLockManagerImpl and CF HLocks.
 For some reason I have a row with single column that should have expired
 yesterday who is still there.
 I tried deleting it using cli, but it is stuck...
 Any ideas how to delete it?

 Thanks,

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956



tokLogo.png

Re: column with TTL of 10 seconds lives very long...

2013-05-23 Thread Tamar Fraenkel
Thanks for the response.
Running date simultaneously on all nodes (using parallel ssh) shows that
they are synced.
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Thu, May 23, 2013 at 12:29 PM, Nikolay Mihaylov n...@nmmm.nu wrote:

 Did you synchronized the clocks between servers?


 On Thu, May 23, 2013 at 9:32 AM, Tamar Fraenkel ta...@tok-media.comwrote:

 Hi!
 I have Cassandra cluster with 3 node running version 1.0.11.

 I am using Hector HLockManagerImpl, which creates a keyspace named
 HLockManagerImpl and CF HLocks.
 For some reason I have a row with single column that should have expired
 yesterday who is still there.
 I tried deleting it using cli, but it is stuck...
 Any ideas how to delete it?

 Thanks,

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956




tokLogo.pngtokLogo.png

AW: column with TTL of 10 seconds lives very long...

2013-05-23 Thread Felipe Sere
This is interesting as it might affect me too :)
I have been observing deadlocks with HLockManagerImpl which dont get resolved 
for a long time
even though the columns with the locks should only live for about 5-10secs.

Any ideas how to investigate this further from the Cassandra-side?

Von: Tamar Fraenkel [ta...@tok-media.com]
Gesendet: Donnerstag, 23. Mai 2013 11:58
An: user@cassandra.apache.org
Betreff: Re: column with TTL of 10 seconds lives very long...

Thanks for the response.
Running date simultaneously on all nodes (using parallel ssh) shows that they 
are synced.
Tamar

Tamar Fraenkel
Senior Software Engineer, TOK Media

[Inline image 1]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Thu, May 23, 2013 at 12:29 PM, Nikolay Mihaylov 
n...@nmmm.numailto:n...@nmmm.nu wrote:
Did you synchronized the clocks between servers?


On Thu, May 23, 2013 at 9:32 AM, Tamar Fraenkel 
ta...@tok-media.commailto:ta...@tok-media.com wrote:
Hi!
I have Cassandra cluster with 3 node running version 1.0.11.

I am using Hector HLockManagerImpl, which creates a keyspace named 
HLockManagerImpl and CF HLocks.
For some reason I have a row with single column that should have expired 
yesterday who is still there.
I tried deleting it using cli, but it is stuck...
Any ideas how to delete it?

Thanks,

Tamar Fraenkel
Senior Software Engineer, TOK Media

[Inline image 1]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




inline: tokLogo.png

RE: column with TTL of 10 seconds lives very long...

2013-05-23 Thread moshe.kranc
Maybe you didn't set the TTL correctly.
Check the TTL of the column using CQL, e.g.:
SELECT TTL (colName) from colFamilyName WHERE condition;

From: Felipe Sere [mailto:felipe.s...@1und1.de]
Sent: Thursday, May 23, 2013 1:28 PM
To: user@cassandra.apache.org
Subject: AW: column with TTL of 10 seconds lives very long...

This is interesting as it might affect me too :)
I have been observing deadlocks with HLockManagerImpl which dont get resolved 
for a long time
even though the columns with the locks should only live for about 5-10secs.

Any ideas how to investigate this further from the Cassandra-side?

Von: Tamar Fraenkel [ta...@tok-media.com]
Gesendet: Donnerstag, 23. Mai 2013 11:58
An: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Betreff: Re: column with TTL of 10 seconds lives very long...
Thanks for the response.
Running date simultaneously on all nodes (using parallel ssh) shows that they 
are synced.
Tamar

Tamar Fraenkel
Senior Software Engineer, TOK Media
[cid:image001.png@01CE57BD.9C67B200]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



On Thu, May 23, 2013 at 12:29 PM, Nikolay Mihaylov 
n...@nmmm.numailto:n...@nmmm.nu wrote:
Did you synchronized the clocks between servers?

On Thu, May 23, 2013 at 9:32 AM, Tamar Fraenkel 
ta...@tok-media.commailto:ta...@tok-media.com wrote:
Hi!
I have Cassandra cluster with 3 node running version 1.0.11.
I am using Hector HLockManagerImpl, which creates a keyspace named 
HLockManagerImpl and CF HLocks.
For some reason I have a row with single column that should have expired 
yesterday who is still there.
I tried deleting it using cli, but it is stuck...
Any ideas how to delete it?
Thanks,

Tamar Fraenkel
Senior Software Engineer, TOK Media
[cid:image001.png@01CE57BD.9C67B200]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___
inline: image001.png

Re: column with TTL of 10 seconds lives very long...

2013-05-23 Thread Tamar Fraenkel
Hi!

TTL was set:

[default@HLockingManager] get
HLocks['/LockedTopic/31a30c12-652d-45b3-9ac2-0401cce85517'];
= (column=69b057d4-3578-4326-a9d9-c975cb8316d2,
value=36396230353764342d333537382d343332362d613964392d633937356362383331366432,
timestamp=1369307815049000, ttl=10)


Also, all other lock columns expire as expected.

Thanks,
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Thu, May 23, 2013 at 1:58 PM, moshe.kr...@barclays.com wrote:

 Maybe you didn’t set the TTL correctly.

 Check the TTL of the column using CQL, e.g.:

 SELECT TTL (colName) from colFamilyName WHERE condition;

 ** **

 *From:* Felipe Sere [mailto:felipe.s...@1und1.de]
 *Sent:* Thursday, May 23, 2013 1:28 PM
 *To:* user@cassandra.apache.org
 *Subject:* AW: column with TTL of 10 seconds lives very long...

 ** **

 This is interesting as it might affect me too :)
 I have been observing deadlocks with HLockManagerImpl which dont get
 resolved for a long time
 even though the columns with the locks should only live for about 5-10secs.

 Any ideas how to investigate this further from the Cassandra-side?
 --

 *Von:* Tamar Fraenkel [ta...@tok-media.com]
 *Gesendet:* Donnerstag, 23. Mai 2013 11:58
 *An:* user@cassandra.apache.org
 *Betreff:* Re: column with TTL of 10 seconds lives very long...

 Thanks for the response.
 Running date simultaneously on all nodes (using parallel ssh) shows that
 they are synced.

 Tamar


 

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media 

 [image: Inline image 1]


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956 

 ** **

 ** **

 ** **

 On Thu, May 23, 2013 at 12:29 PM, Nikolay Mihaylov n...@nmmm.nu wrote:**
 **

 Did you synchronized the clocks between servers?

 ** **

 On Thu, May 23, 2013 at 9:32 AM, Tamar Fraenkel ta...@tok-media.com
 wrote:

 Hi!
 I have Cassandra cluster with 3 node running version 1.0.11.

 I am using Hector HLockManagerImpl, which creates a keyspace named
 HLockManagerImpl and CF HLocks.

 For some reason I have a row with single column that should have expired
 yesterday who is still there.
 I tried deleting it using cli, but it is stuck...
 Any ideas how to delete it?

 Thanks,


 

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media 

 [image: Inline image 1]


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956 

 ** **

 ** **

 ** **

 ** **

 ___

 This message is for information purposes only, it is not a recommendation,
 advice, offer or solicitation to buy or sell a product or service nor an
 official confirmation of any transaction. It is directed at persons who are
 professionals and is not intended for retail customer use. Intended for
 recipient only. This message is subject to the terms at:
 www.barclays.com/emaildisclaimer.

 For important disclosures, please see:
 www.barclays.com/salesandtradingdisclaimer regarding market commentary
 from Barclays Sales and/or Trading, who are active market participants; and
 in respect of Barclays Research, including disclosures relating to specific
 issuers, please see http://publicresearch.barclays.com.

 ___

tokLogo.pngimage001.png

RE: column with TTL of 10 seconds lives very long...

2013-05-23 Thread moshe.kranc
(Probably will not solve your problem, but worth mentioning): It's not enough 
to check that the clocks of all the servers are synchronized - I believe that 
the client node sets the timestamp for a record being written. So, you should 
also check the timestamp on your Hector client nodes.

From: Tamar Fraenkel [mailto:ta...@tok-media.com]
Sent: Thursday, May 23, 2013 2:17 PM
To: user@cassandra.apache.org
Subject: Re: column with TTL of 10 seconds lives very long...

Hi!

TTL was set:

[default@HLockingManager] get 
HLocks['/LockedTopic/31a30c12-652d-45b3-9ac2-0401cce85517'];
= (column=69b057d4-3578-4326-a9d9-c975cb8316d2, 
value=36396230353764342d333537382d343332362d613964392d633937356362383331366432, 
timestamp=1369307815049000, ttl=10)

Also, all other lock columns expire as expected.
Thanks,
Tamar

Tamar Fraenkel
Senior Software Engineer, TOK Media
[cid:image001.png@01CE57C1.5D7C60A0]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



On Thu, May 23, 2013 at 1:58 PM, 
moshe.kr...@barclays.commailto:moshe.kr...@barclays.com wrote:
Maybe you didn't set the TTL correctly.
Check the TTL of the column using CQL, e.g.:
SELECT TTL (colName) from colFamilyName WHERE condition;

From: Felipe Sere [mailto:felipe.s...@1und1.demailto:felipe.s...@1und1.de]
Sent: Thursday, May 23, 2013 1:28 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: AW: column with TTL of 10 seconds lives very long...

This is interesting as it might affect me too :)
I have been observing deadlocks with HLockManagerImpl which dont get resolved 
for a long time
even though the columns with the locks should only live for about 5-10secs.

Any ideas how to investigate this further from the Cassandra-side?

Von: Tamar Fraenkel [ta...@tok-media.commailto:ta...@tok-media.com]
Gesendet: Donnerstag, 23. Mai 2013 11:58
An: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Betreff: Re: column with TTL of 10 seconds lives very long...
Thanks for the response.
Running date simultaneously on all nodes (using parallel ssh) shows that they 
are synced.
Tamar

Tamar Fraenkel
Senior Software Engineer, TOK Media
[cid:image001.png@01CE57C1.5D7C60A0]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736tel:%2B972%202%206409736
Mob:  +972 54 8356490tel:%2B972%2054%208356490
Fax:   +972 2 5612956tel:%2B972%202%205612956



On Thu, May 23, 2013 at 12:29 PM, Nikolay Mihaylov 
n...@nmmm.numailto:n...@nmmm.nu wrote:
Did you synchronized the clocks between servers?

On Thu, May 23, 2013 at 9:32 AM, Tamar Fraenkel 
ta...@tok-media.commailto:ta...@tok-media.com wrote:
Hi!
I have Cassandra cluster with 3 node running version 1.0.11.
I am using Hector HLockManagerImpl, which creates a keyspace named 
HLockManagerImpl and CF HLocks.
For some reason I have a row with single column that should have expired 
yesterday who is still there.
I tried deleting it using cli, but it is stuck...
Any ideas how to delete it?
Thanks,

Tamar Fraenkel
Senior Software Engineer, TOK Media
[cid:image001.png@01CE57C1.5D7C60A0]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736tel:%2B972%202%206409736
Mob:  +972 54 8356490tel:%2B972%2054%208356490
Fax:   +972 2 5612956tel:%2B972%202%205612956





___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimerhttp://www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimerhttp://www.barclays.com/salesandtradingdisclaimer
 regarding market commentary from Barclays Sales and/or Trading, who are active 
market participants; and in respect of Barclays Research, including disclosures 
relating to specific issuers, please see http://publicresearch.barclays.com.

___


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.


Re: column with TTL of 10 seconds lives very long...

2013-05-23 Thread Tamar Fraenkel
good point!

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Thu, May 23, 2013 at 2:25 PM, moshe.kr...@barclays.com wrote:

 (Probably will not solve your problem, but worth mentioning): It’s not
 enough to check that the clocks of all the servers are synchronized – I
 believe that the client node sets the timestamp for a record being written.
 So, you should also check the timestamp on your Hector client nodes.

 ** **

 *From:* Tamar Fraenkel [mailto:ta...@tok-media.com]
 *Sent:* Thursday, May 23, 2013 2:17 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: column with TTL of 10 seconds lives very long...

 ** **

 Hi!

 TTL was set:

 [default@HLockingManager] get
 HLocks['/LockedTopic/31a30c12-652d-45b3-9ac2-0401cce85517'];
 = (column=69b057d4-3578-4326-a9d9-c975cb8316d2,
 value=36396230353764342d333537382d343332362d613964392d633937356362383331366432,
 timestamp=1369307815049000, ttl=10)

 

 Also, all other lock columns expire as expected.

 Thanks,
 Tamar


 

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media 

 [image: Inline image 1]


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956 

 ** **

 ** **

 ** **

 On Thu, May 23, 2013 at 1:58 PM, moshe.kr...@barclays.com wrote:

 Maybe you didn’t set the TTL correctly.

 Check the TTL of the column using CQL, e.g.:

 SELECT TTL (colName) from colFamilyName WHERE condition;

  

 *From:* Felipe Sere [mailto:felipe.s...@1und1.de]
 *Sent:* Thursday, May 23, 2013 1:28 PM
 *To:* user@cassandra.apache.org
 *Subject:* AW: column with TTL of 10 seconds lives very long...

  

 This is interesting as it might affect me too :)
 I have been observing deadlocks with HLockManagerImpl which dont get
 resolved for a long time
 even though the columns with the locks should only live for about 5-10secs.

 Any ideas how to investigate this further from the Cassandra-side?
 --

 *Von:* Tamar Fraenkel [ta...@tok-media.com]
 *Gesendet:* Donnerstag, 23. Mai 2013 11:58
 *An:* user@cassandra.apache.org
 *Betreff:* Re: column with TTL of 10 seconds lives very long...

 Thanks for the response.
 Running date simultaneously on all nodes (using parallel ssh) shows that
 they are synced.

 Tamar


 

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media 

 [image: Inline image 1]


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956 

  

  

  

 On Thu, May 23, 2013 at 12:29 PM, Nikolay Mihaylov n...@nmmm.nu wrote:**
 **

 Did you synchronized the clocks between servers?

  

 On Thu, May 23, 2013 at 9:32 AM, Tamar Fraenkel ta...@tok-media.com
 wrote:

 Hi!
 I have Cassandra cluster with 3 node running version 1.0.11.

 I am using Hector HLockManagerImpl, which creates a keyspace named
 HLockManagerImpl and CF HLocks.

 For some reason I have a row with single column that should have expired
 yesterday who is still there.
 I tried deleting it using cli, but it is stuck...
 Any ideas how to delete it?

 Thanks,


 

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media 

 [image: Inline image 1]


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956 

  

  

  

  

 ___

 This message is for information purposes only, it is not a recommendation,
 advice, offer or solicitation to buy or sell a product or service nor an
 official confirmation of any transaction. It is directed at persons who are
 professionals and is not intended for retail customer use. Intended for
 recipient only. This message is subject to the terms at:
 www.barclays.com/emaildisclaimer.

 For important disclosures, please see:
 www.barclays.com/salesandtradingdisclaimer regarding market commentary
 from Barclays Sales and/or Trading, who are active market participants; and
 in respect of Barclays Research, including disclosures relating to specific
 issuers, please see http://publicresearch.barclays.com.

 ___

 ** **

 ___

 This message is for information purposes only, it is not a recommendation,
 advice, offer or solicitation to buy or sell a product or service nor an
 official confirmation of any transaction. It is directed at persons who are
 professionals and is not intended for retail customer use. Intended for
 recipient only. This message is subject to the terms at:
 www.barclays.com/emaildisclaimer.

 For important disclosures, please see:
 www.barclays.com/salesandtradingdisclaimer regarding market commentary
 from Barclays Sales and/or Trading, who are active market participants; and
 in respect of Barclays 

Re: Commit Log Magic

2013-05-23 Thread Jonathan Ellis
Sstables must be sorted by token, or we can't compact efficiently.
Since writes usually do not arrive in token order, we stage them first
in a memtable.

(cc user@)

On Thu, May 23, 2013 at 8:44 AM, Ansar Rafique ansa...@hotmail.com wrote:
 Hi Jonathan,

 I am Ansar Rafique and I asked you few questions 2 week ago about Cassandra
 Implementation. I was watching your presentation where you suggested the
 page below.

 http://nosql.mypopescu.com/post/27684111441/cassandra-and-solid-state-drives

 I have a question and I have tried to find the answer but didn't really get
 satisfactory response yet. My question is why Cassandra using Commit log for
 durability instead direct write to SSTable. Cassandra acheives high write
 throughput because it stores data first in memtable and then flush into
 disk. Sounds good but remeber Cassandra also write in commit log for
 durability. I made it sure and it's written that write to memetable and
 commit log is synchronous which means it will write first in commit log and
 wait until it complete and will start writing in memtable or vice versa.
 Writing transaction to commit log requires an I/O operation which means for
 each insert we need an I/O :( for writing data in commit log and later
 requires more I/O's to flush data again on disk. Isn't writing to commit log
 is overhead ? Isn't it better to directly write data on disk instead of
 commit log ?

 Remember I/O operations are expensive and reduction in I/O's mean
 improvement in performance. If we look at RDBMS, it stores data in commit
 log as well as disk. Fair enough but if we don't insert data in commit log.
 It's performance should be the same as Cassandra because it perform I/O to
 insert data on disk and Cassandra also perform's I/O to insert data on
 commit log. Is commit log is less expensive ? I didn't really understood the
 magic :) Would you like to elaborate it more ?

 Thank you in advance for your time. Looking to hear from you.

 Regards,
 Ansar Rafique







-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: High performance disk io

2013-05-23 Thread Igor

Hello Christopher,

BTW, are you talking about 99th percentiles on client side, or about 
percentiles from cassandra histograms for CF on cassandra side?


Thanks!

On 05/22/2013 05:41 PM, Christopher Wirt wrote:


Hi Igor,

Yea same here, 15ms for 99^th percentile is our max. Currently getting 
one or two ms for most CF. It goes up at peak times which is what we 
want to avoid.


We're using Cass 1.2.4 w/vnodes and our own barebones driver on top of 
thrift. Needed to be .NET so Hector and Astyanax were not options.


Do you use SSDs or multiple SSDs in any kind of configuration or RAID?

Thanks

Chris

*From:*Igor [mailto:i...@4friends.od.ua]
*Sent:* 22 May 2013 15:07
*To:* user@cassandra.apache.org
*Subject:* Re: High performance disk io

Hello

What level of read performance do you expect? We have limit 15 ms for 
99 percentile with average read latency near 0.9ms. For some CF 99 
percentile actually equals to 2ms, for other - to 10ms, this depends 
on the data volume you read in each query.


Tuning read performance involved cleaning up data model, tuning 
cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.


On 05/22/2013 04:40 PM, Christopher Wirt wrote:

Hello,

We're looking at deploying a new ring where we want the best
possible read performance.

We've setup a cluster with 6 nodes, replication level 3, 32Gb of
memory, 8Gb Heap, 800Mb keycache, each holding 40/50Gb of data on
a 200Gb SSD and 500Gb SATA for OS and commitlog

Three column families

ColFamily1 50% of the load and data

ColFamily2 35% of the load and data

ColFamily3 15% of the load and data

At the moment we are still seeing around 20% disk utilisation and
occasionally as high as 40/50% on some nodes at peak time.. we are
conducting some semi live testing.

CPU looks fine, memory is fine, keycache hit rate is about 80%
(could be better, so maybe we should be increasing the keycache size?)

Anyway, we're looking into what we can do to improve this.

One conversion we are having at the moment is around the SSD disk
setup..

We are considering moving to have 3 smaller SSD drives and
spreading the data across those.

The possibilities are:

-We have a RAID0 of the smaller SSDs and hope that improves
performance.

Will this acutally yield better throughput?

-We mount the SSDs to different directories and define multiple
data directories in Cassandra.yaml.

Will not having a layer of RAID controller improve the throughput?

-We mount the SSDs to different columns family directories and
have a single data directory declared in Cassandra.yaml.

Think this is quite attractive idea.

What are the drawbacks? System column families will be on the main
SATA?

-We don't change anything and just keep upping our keycache.

-Anything you guys can think of.

Ideas and thoughts welcome. Thanks for your time and expertise.

Chris





RE: High performance disk io

2013-05-23 Thread Christopher Wirt
Hi Igor,

 

I was talking about 99th percentile from the Cassandra histograms when I
said '1 or 2 ms for most cf'. 

 

But we have measured client side too and generally get a couple ms added on
top.. as one might expect.

 

Anyone interested - 

diskio (my original question) we have tried out the multiple SSD setup and
found it to work well and reduce the impact of a repair on node performance.


We ended up going with the single data directory in cassandra.yaml and mount
one SSD against that. Then have a dedicated SSD per large column family.

We're now moving all of nodes to have the same setup.

 

 

Chris

 

From: Igor [mailto:i...@4friends.od.ua] 
Sent: 23 May 2013 15:00
To: user@cassandra.apache.org
Subject: Re: High performance disk io

 

Hello Christopher,

BTW, are you talking about 99th percentiles on client side, or about
percentiles from cassandra histograms for CF on cassandra side?

Thanks!

On 05/22/2013 05:41 PM, Christopher Wirt wrote:

Hi Igor, 

 

Yea same here, 15ms for 99th percentile is our max. Currently getting one or
two ms for most CF. It goes up at peak times which is what we want to avoid.

 

We're using Cass 1.2.4 w/vnodes and our own barebones driver on top of
thrift. Needed to be .NET so Hector and Astyanax were not options.

 

Do you use SSDs or multiple SSDs in any kind of configuration or RAID?

 

Thanks

 

Chris

 

From: Igor [mailto:i...@4friends.od.ua] 
Sent: 22 May 2013 15:07
To: user@cassandra.apache.org
Subject: Re: High performance disk io

 

Hello

What level of read performance do you expect? We have limit 15 ms for 99
percentile with average read latency near 0.9ms. For some CF 99 percentile
actually equals to 2ms, for other - to 10ms, this depends on the data volume
you read in each query.

Tuning read performance involved cleaning up data model, tuning
cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.

On 05/22/2013 04:40 PM, Christopher Wirt wrote:

Hello,

 

We're looking at deploying a new ring where we want the best possible read
performance.

 

We've setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb
SATA for OS and commitlog

Three column families

ColFamily1 50% of the load and data

ColFamily2 35% of the load and data

ColFamily3 15% of the load and data

 

At the moment we are still seeing around 20% disk utilisation and
occasionally as high as 40/50% on some nodes at peak time.. we are
conducting some semi live testing.

CPU looks fine, memory is fine, keycache hit rate is about 80% (could be
better, so maybe we should be increasing the keycache size?)

 

Anyway, we're looking into what we can do to improve this.

 

One conversion we are having at the moment is around the SSD disk setup..

 

We are considering moving to have 3 smaller SSD drives and spreading the
data across those.

 

The possibilities are:

-We have a RAID0 of the smaller SSDs and hope that improves performance. 

Will this acutally yield better throughput?

 

-We mount the SSDs to different directories and define multiple data
directories in Cassandra.yaml.

Will not having a layer of RAID controller improve the throughput?

 

-We mount the SSDs to different columns family directories and have a single
data directory declared in Cassandra.yaml. 

Think this is quite attractive idea.

What are the drawbacks? System column families will be on the main SATA?

 

-We don't change anything and just keep upping our keycache.

-Anything you guys can think of.

 

Ideas and thoughts welcome. Thanks for your time and expertise. 

 

Chris

 

 

 

 



Re: High performance disk io

2013-05-23 Thread Edward Capriolo
I have used both rotation disks with lots of RAM as well as SSD devices. An
important thing to consider is that SSD devices are not magic. You have
big-o-notation in several places.
1) more data large bloom filters
2) more data (larger key caches) JVM overhead
3) more requests more young gen JVM overhead
4) more data longer compaction (even with ssd)
5) more writes (more memtable flushing)
Bottom line: more data more disk seeks

We have used both the mid level SSD as well as the costly fusion io. Fit in
RAM/VFScache delivers better more predictable low latency, even with very
fast disks the average, 95th, and 99th, percentile can get by very far
apart. I am currently trying to really study the effect of the width of a
row (being in multiple sstables) vs its 95th percentile read time.


On Thu, May 23, 2013 at 10:43 AM, Christopher Wirt chris.w...@struq.comwrote:

 Hi Igor,

 ** **

 I was talking about 99th percentile from the Cassandra histograms when I
 said ‘1 or 2 ms for most cf’. 

 ** **

 But we have measured client side too and generally get a couple ms added
 on top.. as one might expect.

 ** **

 Anyone interested - 

 diskio (my original question) we have tried out the multiple SSD setup and
 found it to work well and reduce the impact of a repair on node
 performance. 

 We ended up going with the single data directory in cassandra.yaml and
 mount one SSD against that. Then have a dedicated SSD per large column
 family.

 We’re now moving all of nodes to have the same setup.

 ** **

 ** **

 Chris

 ** **

 *From:* Igor [mailto:i...@4friends.od.ua]
 *Sent:* 23 May 2013 15:00
 *To:* user@cassandra.apache.org
 *Subject:* Re: High performance disk io

 ** **

 Hello Christopher,

 BTW, are you talking about 99th percentiles on client side, or about
 percentiles from cassandra histograms for CF on cassandra side?

 Thanks!

 On 05/22/2013 05:41 PM, Christopher Wirt wrote:

 Hi Igor, 

  

 Yea same here, 15ms for 99th percentile is our max. Currently getting one
 or two ms for most CF. It goes up at peak times which is what we want to
 avoid.

  

 We’re using Cass 1.2.4 w/vnodes and our own barebones driver on top of
 thrift. Needed to be .NET so Hector and Astyanax were not options.

  

 Do you use SSDs or multiple SSDs in any kind of configuration or RAID?

  

 Thanks

  

 Chris

  

 *From:* Igor [mailto:i...@4friends.od.ua i...@4friends.od.ua]
 *Sent:* 22 May 2013 15:07
 *To:* user@cassandra.apache.org
 *Subject:* Re: High performance disk io

  

 Hello

 What level of read performance do you expect? We have limit 15 ms for 99
 percentile with average read latency near 0.9ms. For some CF 99 percentile
 actually equals to 2ms, for other - to 10ms, this depends on the data
 volume you read in each query.

 Tuning read performance involved cleaning up data model, tuning
 cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.

 On 05/22/2013 04:40 PM, Christopher Wirt wrote:

 Hello,

  

 We’re looking at deploying a new ring where we want the best possible read
 performance.

  

 We’ve setup a cluster with 6 nodes, replication level 3, 32Gb of memory,
 8Gb Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and
 500Gb SATA for OS and commitlog

 Three column families

 ColFamily1 50% of the load and data

 ColFamily2 35% of the load and data

 ColFamily3 15% of the load and data

  

 At the moment we are still seeing around 20% disk utilisation and
 occasionally as high as 40/50% on some nodes at peak time.. we are
 conducting some semi live testing.

 CPU looks fine, memory is fine, keycache hit rate is about 80% (could be
 better, so maybe we should be increasing the keycache size?)

  

 Anyway, we’re looking into what we can do to improve this.

  

 One conversion we are having at the moment is around the SSD disk setup..*
 ***

  

 We are considering moving to have 3 smaller SSD drives and spreading the
 data across those.

  

 The possibilities are:

 -We have a RAID0 of the smaller SSDs and hope that improves performance. *
 ***

 Will this acutally yield better throughput?

  

 -We mount the SSDs to different directories and define multiple data
 directories in Cassandra.yaml.

 Will not having a layer of RAID controller improve the throughput?

  

 -We mount the SSDs to different columns family directories and have a
 single data directory declared in Cassandra.yaml. 

 Think this is quite attractive idea.

 What are the drawbacks? System column families will be on the main SATA?**
 **

  

 -We don’t change anything and just keep upping our keycache.

 -Anything you guys can think of.

  

 Ideas and thoughts welcome. Thanks for your time and expertise. 

  

 Chris

  

  

  

 ** **



Re: Cassandra 1.2 TTL histogram problem

2013-05-23 Thread Yuki Morishita
 Are you sure that it is a good idea to estimate remainingKeys like that?

Since we don't want to scan every row to check overlap and cause heavy
IO automatically, the method can only do the best-effort type of
calculation.
In your case, try running user defined compaction on that sstable
file. It goes through every row and remove tombstones when droppable.


On Wed, May 22, 2013 at 11:48 AM, cem cayiro...@gmail.com wrote:
 Thanks for the answer.

 It means that if we use randompartioner it will be very difficult to  find a
 sstable without any overlap.

 Let me give you an example from my test.

 I have ~50 sstables in total and an sstable with droppable ratio 0.9. I use
 GUID for key and only insert (no update -delete) so I dont expect a key in
 different sstables.

 I put extra logging to  AbstractCompactionStrategy to see the
 overlaps.size() and keys and remainingKeys:

 overlaps.size() is around 30, number of keys for that sstable is around 5 M
 and remainingKeys is always 0.

 Are you sure that it is a good idea to estimate remainingKeys like that?

 Best Regards,
 Cem



 On Wed, May 22, 2013 at 5:58 PM, Yuki Morishita mor.y...@gmail.com wrote:

  Can method calculate non-overlapping keys as overlapping?

 Yes.
 And randomized keys don't matter here since sstables are sorted by
 token calculated from key by your partitioner, and the method uses
 sstable's min/max token to estimate overlap.

 On Tue, May 21, 2013 at 4:43 PM, cem cayiro...@gmail.com wrote:
  Thank you very much for the swift answer.
 
  I have one more question about the second part. Can method calculate
  non-overlapping keys as overlapping? I mean it uses max and min tokens
  and
  column count. They can be very close to each other if random keys are
  used.
 
  In my use case I generate a GUID for each key and send a single write
  request.
 
  Cem
 
  On Tue, May 21, 2013 at 11:13 PM, Yuki Morishita mor.y...@gmail.com
  wrote:
 
   Why does Cassandra single table compaction skips the keys that are in
   the other sstables?
 
  because we don't want to resurrect deleted columns. Say, sstable A has
  the column with timestamp 1, and sstable B has the same column which
  deleted at timestamp 2. Then if we purge that column only from sstable
  B, we would see the column with timestamp 1 again.
 
   I also dont understand why we have this line in
   worthDroppingTombstones
   method
 
  What the method is trying to do is to guess how many columns that
  are not in the rows that don't overlap, without actually going through
  every rows in the sstable. We have statistics like column count
  histogram, min and max row token for every sstables, we use those in
  the method to estimate how many columns the two sstables overlap.
  You may have remainingColumnsRatio of 0 when the two sstables overlap
  almost entirely.
 
 
  On Tue, May 21, 2013 at 3:43 PM, cem cayiro...@gmail.com wrote:
   Hi all,
  
   I have a question about ticket
   https://issues.apache.org/jira/browse/CASSANDRA-3442
  
   Why does Cassandra single table compaction skips the keys that are in
   the
   other sstables? Please correct if I am wrong.
  
   I also dont understand why we have this line in
   worthDroppingTombstones
   method:
  
   double remainingColumnsRatio = ((double) columns) /
   (sstable.getEstimatedColumnCount().count() *
   sstable.getEstimatedColumnCount().mean());
  
   remainingColumnsRatio  is always 0 in my case and the droppableRatio
   is
   0.9. Cassandra skips all sstables which are already expired.
  
   This line was introduced by
   https://issues.apache.org/jira/browse/CASSANDRA-4022.
  
   Best Regards,
   Cem
 
 
 
  --
  Yuki Morishita
   t:yukim (http://twitter.com/yukim)
 
 



 --
 Yuki Morishita
  t:yukim (http://twitter.com/yukim)





-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: exception causes streaming to hang forever

2013-05-23 Thread Yuki Morishita
What kind of error does the other end of streaming(/10.10.42.36) say?

On Wed, May 22, 2013 at 5:19 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
 We had 3 nodes roll on good and the next 2, we see a remote node with this 
 exception every time we start over and bootstrap the node

 ERROR [Streaming to /10.10.42.36:2] 2013-05-22 14:47:59,404 
 CassandraDaemon.java (line 132) Exception in thread Thread[Streaming to 
 /10.10.42.36:2,5,main]
 java.lang.RuntimeException: java.io.IOException: Input/output error
 at com.google.common.base.Throwables.propagate(Throwables.java:160)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Input/output error
 at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
 at 
 sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:405)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:506)
 at 
 org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:90)
 at 
 org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 ... 3 more

 Are there any ideas what this is?  Google doesn't real show any useful advice 
 on this and our node has not joined the ring yet so I don't think we can run 
 a repair just yet to avoid it and try synching via another means.  It seems 
 on a streaming failure, it never recovers from this.  Any ideas?

 We are on cassandra 1.2.2

 Thanks,
 Dean




-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: write time of CQL3 set items

2013-05-23 Thread Sylvain Lebresne
   Does anyone know I way I could expose the write time of set items?


You cannot currently unfortunately.
The problem is really just an API one. Since currently you can only ever
query a full collection, you cannot apply writeTime() to only an element,
and applying it to the whole collection doesn't make sense, in the sense
that each element have a write time as you said.

We'll likely allow to query individual elements of collections in the
future, at which point allowing to get the write time of said individual
will work. But let's say that today we just don't have a syntax yet to make
it work.

--
Sylvain


Re: Creating namespace and column family from multiple nodes concurrently

2013-05-23 Thread Emalayan Vairavanathan
Hi Arthur and Farraz,

Thank you for getting back to me.

I am trying to avoid sync among concurrent instances and this is why I am 
preferring Option - 2. Further in my application, I have reasonable window 
between the application initialization phase and the application runtime.  So 
as long as Cassandra can safely handle concurrent creation I should be fine.

Do you have any idea how Cassandra is going to handle concurrent namespace and 
column family creation (Here all the instances are going to create the same 
namespace and column families concurrently)? 
        - Does Cassandra take much time to agree on a final schema (In case if 
Cassandra is using some sort of exponential back off algorithms to handle 
schema conflicts) ? 
        - Or is it going to result schema conflicts which needs manual 
intervention ?
        - Or will this result in race conditions ?
        - Or some other issues e.g: memory/ cpu /network bottlenecks ?  

Thank you
Emalayan



 From: Arthur Zubarev arthur.zuba...@aol.com
To: user@cassandra.apache.org; svemala...@yahoo.com 
Sent: Wednesday, 22 May 2013 8:07 PM
Subject: Re: Creating namespace and column family from multiple nodes 
concurrently
 


I am assuming here you want to sync all the 100s of nodes once the application 
is airborne. I suspect this would flood the network and even potentially affect 
the machine itself memory-wise. How are you going to maintain the nodes 
(compaction+repair)?


Regards,

Arthur




-Original Message-
From: Emalayan Vairavanathan svemala...@yahoo.com
To: user user@cassandra.apache.org
Sent: Wed, May 22, 2013 8:31 pm
Subject: Creating namespace and column family from multiple nodes concurrently


Hi all,

I am implementing a distributed application which runs on 100s of machines 
concurrently. This application is going to use Cassandra as underlaying storage.

The application creates the schema (name space and column families) during 
initialization phase.  It seems I have two options to create the schema.

Option - 1 : Using a single node for schema creation.
        Option - 2: Having all the nodes ( 100) to run the same schema 
creation logic (First, nodes will check whether the schema is already available 
and then try to create the schema if it is not available already).  

To keep the initialization phase simple, I prefer to go for Option - 2. However 
I am not sure how Cassandra is going to behave if multiple nodes try to create 
the same schema (namespace and column families) concurrently. It would be nice 
if someone can tell me about the implications of Option - 2 with Cassandra 
version 1.2.2.

Please let me know if you have question.

Thank you
VE

Re: Creating namespace and column family from multiple nodes concurrently

2013-05-23 Thread Arthur Zubarev
Would each device/machine have its own keyspace?

Basically, your client needs to take care of a successful creation of the 
schema and any other verifications and it is going to be time consuming. 

From: Emalayan Vairavanathan 
Sent: Thursday, May 23, 2013 3:07 PM
To: user@cassandra.apache.org 
Subject: Re: Creating namespace and column family from multiple nodes 
concurrently

Hi Arthur and Farraz,


Thank you for getting back to me.


I am trying to avoid sync among concurrent instances and this is why I am 
preferring Option - 2. Further in my application, I have reasonable window 
between the application initialization phase and the application runtime.  So 
as long as Cassandra can safely handle concurrent creation I should be fine.


Do you have any idea how Cassandra is going to handle concurrent namespace and 
column family creation (Here all the instances are going to create the same 
namespace and column families concurrently)? 
- Does Cassandra take much time to agree on a final schema (In case if 
Cassandra is using some sort of exponential back off algorithms to handle 
schema conflicts) ? 
- Or is it going to result schema conflicts which needs manual 
intervention ?
- Or will this result in race conditions ?
- Or some other issues e.g: memory/ cpu /network bottlenecks ?  


Thank you
Emalayan



From: Arthur Zubarev arthur.zuba...@aol.com
To: user@cassandra.apache.org; svemala...@yahoo.com 
Sent: Wednesday, 22 May 2013 8:07 PM
Subject: Re: Creating namespace and column family from multiple nodes 
concurrently


I am assuming here you want to sync all the 100s of nodes once the application 
is airborne. I suspect this would flood the network and even potentially affect 
the machine itself memory-wise. How are you going to maintain the nodes 
(compaction+repair)? 


Regards,

Arthur




-Original Message-
From: Emalayan Vairavanathan svemala...@yahoo.com
To: user user@cassandra.apache.org
Sent: Wed, May 22, 2013 8:31 pm
Subject: Creating namespace and column family from multiple nodes concurrently


Hi all,

I am implementing a distributed application which runs on 100s of machines 
concurrently. This application is going to use Cassandra as underlaying storage.

The application creates the schema (name space and column families) during 
initialization phase.  It seems I have two options to create the schema.


Option - 1 : Using a single node for schema creation.
Option - 2: Having all the nodes ( 100) to run the same schema 
creation logic (First, nodes will check whether the schema is already available 
and then try to create the schema if it is not available already).  

To keep the initialization phase simple, I prefer to go for Option - 2. However 
I am not sure how Cassandra is going to behave if multiple nodes try to create 
the same schema (namespace and column families) concurrently. It would be nice 
if someone can tell me about the implications of Option - 2 with Cassandra 
version 1.2.2.


Please let me know if you have question.


Thank you
VE





 





Re: Creating namespace and column family from multiple nodes concurrently

2013-05-23 Thread Emalayan Vairavanathan
Would each device/machine have its own keyspace?

No. All the machines are going to run the exactly same CQL commands and going 
to create the same namespace and column families.

Thank you
Emalayan



 From: Arthur Zubarev arthur.zuba...@aol.com
To: Emalayan Vairavanathan svemala...@yahoo.com; user@cassandra.apache.org 
Sent: Thursday, 23 May 2013 12:20 PM
Subject: Re: Creating namespace and column family from multiple nodes 
concurrently
 


Would each device/machine have its own keyspace?
 
Basically, your client needs to take care of a successful creation of the 
schema and any other verifications and it is going to be time consuming.  
From: Emalayan Vairavanathan 
Sent: Thursday, May 23, 2013 3:07 PM
To: user@cassandra.apache.org 
Subject: Re: Creating namespace and column family from multiple 
nodes concurrently
  Hi Arthur and Farraz,

Thank 
you for getting back to me.

I 
am trying to avoid sync among concurrent instances and thisis why I am 
preferring Option - 2. Further in my application, I have 
reasonable window between the application initialization phase and the 
application runtime.  So as long as Cassandra can safely handle concurrent 
creation I should be fine.

Do you have any idea how Cassandra is 
going to handle concurrent namespace and column family creation (Here all the 
instances are going to create the same namespace and column families 
concurrently)? 
    
- Does Cassandra take much time to agree on a final schema (In case if 
Cassandra 
is using some sort of exponential back off algorithms to handle schema 
conflicts) ? 
    
- Or is it going to result schema conflicts which needs manual intervention 
?
    
- Or will this result in race conditions ?
    
- Or some other issues e.g: memory/ cpu /network bottlenecks ?  

Thank you
Emalayan
 


 From: Arthur Zubarev arthur.zuba...@aol.com
To: user@cassandra.apache.org; 
svemala...@yahoo.com 
Sent: Wednesday, 22 May 2013 8:07 PM
Subject: Re: Creating namespace and column 
family from multiple nodes concurrently

 
I am 
assuming here you want to sync all the 100s of nodes once the application is 
airborne. I suspect this would flood the network and even potentially affect 
the 
machine itself memory-wise. How are you going to maintain the nodes 
(compaction+repair)? 
 
Regards,

Arthur


 
 
-Original 
Message-
From: Emalayan Vairavanathan svemala...@yahoo.com
To: 
user user@cassandra.apache.org
Sent: Wed, May 22, 2013 8:31 
pm
Subject: Creating namespace and column family from multiple nodes 
concurrently


Hi all,
 
I 
am implementing a distributed application which runs on 100s of machines 
concurrently. This application is going to use Cassandra as underlaying 
storage.
 
The 
application creates the schema 
(name space and column families) during initialization phase.  It seems I have 
two options 
to create the schema.

Option - 1 : 
Using a single node for schema creation.
    
Option - 2: Having all the nodes ( 100) to run the same schema creation 
logic (First, nodes will check whether the schema is already available and then 
try to create the schema if it is not available already).  
 
To 
keep the initialization phase simple, I prefer to go for Option - 2. However I 
am not sure how Cassandra is going to behave if multiple nodes try to create 
the 
same schema (namespace and column families) concurrently. It would be nice if 
someone can tell me about the implications of Option - 2 with Cassandra version 
1.2.2.

Please let me know if you have 
question.

Thank you
VE

Re: Creating namespace and column family from multiple nodes concurrently

2013-05-23 Thread Robert Coli
On Thu, May 23, 2013 at 12:07 PM, Emalayan Vairavanathan
svemala...@yahoo.com wrote:
 Do you have any idea how Cassandra is going to handle concurrent namespace
 and column family creation (Here all the instances are going to create the
 same namespace and column families concurrently)?
 [...]
 However I am not sure how Cassandra is going to behave if multiple nodes try
 to create the same schema (namespace and column families) concurrently. It
 would be nice if someone can tell me about the implications of Option - 2
 with Cassandra version 1.2.2.

Concurrent CREATE is allegedly working in 1.2.0, per NEWS.txt [1]. I
say allegedly working because this feature was also allegedly working
in 1.1.0. Given past experience, I continue to (perhaps
pessimistically) believe that frequent dynamic updates of schema are
likely to result in schema desynch. I would be interested to hear if
you go down this route and do not encounter problems.

See also CASSANDRA-3794 [2] for details.

=Rob

[1] https://github.com/apache/cassandra/blob/cassandra-1.2/NEWS.txt
[2] https://issues.apache.org/jira/browse/CASSANDRA-3794


Re: Creating namespace and column family from multiple nodes concurrently

2013-05-23 Thread Arthur Zubarev
so where the multiple nodes are? I am just puzzled 

From: Emalayan Vairavanathan 
Sent: Thursday, May 23, 2013 3:43 PM
To: Arthur Zubarev ; user@cassandra.apache.org 
Subject: Re: Creating namespace and column family from multiple nodes 
concurrently

Would each device/machine have its own keyspace?

No. All the machines are going to run the exactly same CQL commands and going 
to create the same namespace and column families.

Thank you
Emalayan



From: Arthur Zubarev arthur.zuba...@aol.com
To: Emalayan Vairavanathan svemala...@yahoo.com; user@cassandra.apache.org 
Sent: Thursday, 23 May 2013 12:20 PM
Subject: Re: Creating namespace and column family from multiple nodes 
concurrently


Would each device/machine have its own keyspace?

Basically, your client needs to take care of a successful creation of the 
schema and any other verifications and it is going to be time consuming. 

From: Emalayan Vairavanathan 
Sent: Thursday, May 23, 2013 3:07 PM
To: user@cassandra.apache.org 
Subject: Re: Creating namespace and column family from multiple nodes 
concurrently

Hi Arthur and Farraz,


Thank you for getting back to me.


I am trying to avoid sync among concurrent instances and this is why I am 
preferring Option - 2. Further in my application, I have reasonable window 
between the application initialization phase and the application runtime.  So 
as long as Cassandra can safely handle concurrent creation I should be fine.


Do you have any idea how Cassandra is going to handle concurrent namespace and 
column family creation (Here all the instances are going to create the same 
namespace and column families concurrently)? 
- Does Cassandra take much time to agree on a final schema (In case if 
Cassandra is using some sort of exponential back off algorithms to handle 
schema conflicts) ? 
- Or is it going to result schema conflicts which needs manual 
intervention ?
- Or will this result in race conditions ?
- Or some other issues e.g: memory/ cpu /network bottlenecks ?  


Thank you
Emalayan



From: Arthur Zubarev arthur.zuba...@aol.com
To: user@cassandra.apache.org; svemala...@yahoo.com 
Sent: Wednesday, 22 May 2013 8:07 PM
Subject: Re: Creating namespace and column family from multiple nodes 
concurrently


I am assuming here you want to sync all the 100s of nodes once the application 
is airborne. I suspect this would flood the network and even potentially affect 
the machine itself memory-wise. How are you going to maintain the nodes 
(compaction+repair)? 


Regards,

Arthur




-Original Message-
From: Emalayan Vairavanathan svemala...@yahoo.com
To: user user@cassandra.apache.org
Sent: Wed, May 22, 2013 8:31 pm
Subject: Creating namespace and column family from multiple nodes concurrently


Hi all,

I am implementing a distributed application which runs on 100s of machines 
concurrently. This application is going to use Cassandra as underlaying storage.

The application creates the schema (name space and column families) during 
initialization phase.  It seems I have two options to create the schema.


Option - 1 : Using a single node for schema creation.
Option - 2: Having all the nodes ( 100) to run the same schema 
creation logic (First, nodes will check whether the schema is already available 
and then try to create the schema if it is not available already).  

To keep the initialization phase simple, I prefer to go for Option - 2. However 
I am not sure how Cassandra is going to behave if multiple nodes try to create 
the same schema (namespace and column families) concurrently. It would be nice 
if someone can tell me about the implications of Option - 2 with Cassandra 
version 1.2.2.


Please let me know if you have question.


Thank you
VE





 








Re: column with TTL of 10 seconds lives very long...

2013-05-23 Thread Robert Coli
On Wed, May 22, 2013 at 11:32 PM, Tamar Fraenkel ta...@tok-media.comwrote:

 I am using Hector HLockManagerImpl, which creates a keyspace named
 HLockManagerImpl and CF HLocks.
 For some reason I have a row with single column that should have expired
 yesterday who is still there.
 I tried deleting it using cli, but it is stuck...
 Any ideas how to delete it?


is still there is sorta ambiguous. Do you mean that clients see it or
that it is still in the (immutable) data file it was previously in?

If the latter, what is gc_grace_seconds set to? Make sure it's set to a low
value and then make sure that your TTL-expired key is compacted?

=Rob


Re: Cassandra read reapair

2013-05-23 Thread aaron morton
If you are reading and writing at CL QUOURM and getting inconsistent results 
that sounds like a bug. If you are mixing the CL levels such that R + W = N 
then it's expected behaviour. 


Can you reproduce the issue outside of your app ? 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/05/2013, at 8:55 PM, Kais Ahmed k...@neteck-fr.com wrote:

  Checking you do not mean the row key is corrupt and cannot be read. 
 Yes, i can read it but all read don't return the same result except for CL ALL
 
  By default in 1.X and beyond the default read repair chance is 0.1, so it's 
  only enabled on 10% of requests. 
 You are right read repair chance is set to 0.1, but i launched a read repair 
 which did not solved the problem. Any idea?
 
 What CL are you writing at ? 
 All write are in CL QUORUM
 
 thank you aaron for your answer. 
 
 
 2013/5/21 aaron morton aa...@thelastpickle.com
 Only some keys of one CF are corrupt. 
 Checking you do not mean the row key is corrupt and cannot be read. 
 
 I thought using CF ALL, would correct the problem with READ REPAIR, but by 
 returning to CL QUORUM, the problem persists.
 
 
 By default in 1.X and beyond the default read repair chance is 0.1, so it's 
 only enabled on 10% of requests. 
 
 
 In the absence of further writes all reads (at any CL) should return the same 
 value. 
 
 What CL are you writing at ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 19/05/2013, at 1:28 AM, Kais Ahmed k...@neteck-fr.com wrote:
 
 Hi all,
 
 I encountered a consistency problem one some keys using phpcassa and 
 Cassandra 1.2.3 since a server crash 
 
 Only some keys of one CF are corrupt. 
 
 I lauched a nodetool repair that successfully completed but don't correct 
 the issue.
 
 
 
 When i try to get a corrupt Key with :
 
 CL ONE, the result contains 7 or 8 or 9 columns
 
 CL QUORUM, result contains 8 or 9 columns
 
 CL ALL, the data is consistent and returns always 9 columns
 
 
 
 I thought using CF ALL, would correct the problem with READ REPAIR, but by 
 returning to CL QUORUM, the problem persists.
 
 
 Thank you for your help
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Re: Cassandra hangs on large hinted handoffs

2013-05-23 Thread Edward Capriolo
For some reason the 1.0.7 hints actually use a super column :)


On Thu, May 23, 2013 at 6:18 PM, aaron morton aa...@thelastpickle.comwrote:

 I know how this sounds, but upgrading to 1.1.11 is the best approach.
 1.0X is not getting any fixes, 1.1X is the most stable and still getting
 some patches, and 1.2 is stable and in use.

 Hint storage has been redesigned in 1.2.

 Any suggestions on how to make the cluster more tolerant to downtimes?

 Hints are always seen as an optimisation, their success or otherwise does
 not impact the consistency guarantees.

 If are you dealing with a very high throughput as a work around you can
 reduce the time that hints are stored for a down node, see the yaml file
 for info.

 The behaviour is changes if you have lots of small or large column, this
 is the from HintedHandoff manager that selects the page size.

 int pageSize = PAGE_SIZE;
 // read less columns (mutations) per page if they are very large
 if (hintStore.getMeanColumns()  0)
 {
 int averageColumnSize = (int) (hintStore.getMeanRowSize() /
 hintStore.getMeanColumns());
 pageSize = Math.min(PAGE_SIZE,
 DatabaseDescriptor.getInMemoryCompactionLimit() / averageColumnSize);
 pageSize = Math.max(2, pageSize); // page size of 1 does not
 allow actual paging b/c of = behavior on startColumn
 logger_.debug(average hinted-row column size is {}; using
 pageSize of {}, averageColumnSize, pageSize);
 }

 If you reduce the in_memory_compaction_limit yaml setting that would
 reduce the page size

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 21/05/2013, at 9:26 PM, Vladimir Volkov vlad.vol...@gmail.com wrote:

 Hello.

 I'm stress-testing our Cassandra (version 1.0.9) cluster, and tried
 turning off two of the four nodes for half an hour under heavy load. As a
 result I got a large volume of hints on the alive nodes - HintsColumnFamily
 takes about 1.5 GB disk space on each of the nodes. It seems, these hints
 are never replayed successfully.

 After I bring other nodes back online, tpstats shows active handoffs, but
 I can't see any writes on the target nodes.
 The log indicates memory pressure - the heap is 80% full (heap size is
 8GB total, 1GB young).

 A fragment of the log:
  INFO 18:34:05,513 Started hinted handoff for token: 1 with IP: /
 84.201.162.144
  INFO 18:34:06,794 GC for ParNew: 300 ms for 1 collections, 5974181760
 used; max is 8588951552
  INFO 18:34:07,795 GC for ParNew: 263 ms for 1 collections, 6226018744
 used; max is 8588951552
  INFO 18:34:08,795 GC for ParNew: 256 ms for 1 collections, 6559918392
 used; max is 8588951552
  INFO 18:34:09,796 GC for ParNew: 231 ms for 1 collections, 6846133712
 used; max is 8588951552
  WARN 18:34:09,805 Heap is 0.7978131149667941 full.  You may need to
 reduce memtable and/or cache sizes.  Cassandra will now flush up to the two
 largest memtables to free up memory.
  WARN 18:34:09,805 Flushing CFS(Keyspace='test', ColumnFamily='t2') to
 relieve memory pressure
  INFO 18:34:09,806 Enqueuing flush of Memtable-t2@639524673(60608588/571839171
 serialized/live bytes, 743266 ops)
  INFO 18:34:09,807 Writing Memtable-t2@639524673(60608588/571839171
 serialized/live bytes, 743266 ops)
  INFO 18:34:11,018 GC for ParNew: 449 ms for 2 collections, 6573394480used; 
 max is
 8588951552
  INFO 18:34:12,019 GC for ParNew: 265 ms for 1 collections, 6820930056
 used; max is 8588951552
  INFO 18:34:13,112 GC for ParNew: 331 ms for 1 collections, 6900566728
 used; max is 8588951552
  INFO 18:34:14,181 GC for ParNew: 269 ms for 1 collections, 7101358936
 used; max is 8588951552
  INFO 18:34:14,691 Completed flushing
 /mnt/raid/cassandra/data/test/t2-hc-244-Data.db (56156246 bytes)
  INFO 18:34:15,381 GC for ParNew: 280 ms for 1 collections, 7268441248
 used; max is 8588951552
  INFO 18:34:35,306 InetAddress /84.201.162.144 is now dead.
  INFO 18:34:35,306 GC for ConcurrentMarkSweep: 19223 ms for 1 collections,
 3774714808 used; max is 8588951552
  INFO 18:34:35,309 InetAddress /84.201.162.144 is now UP

 After taking off the load and restatring the service, I still see pending
 handoffs:
 $ nodetool -h localhost tpstats
 Pool NameActive   Pending  Completed   Blocked
 All time blocked
 ReadStage 0 01004257
 0 0
 RequestResponseStage  0 0  92555
 0 0
 MutationStage 0 0  6
 0 0
 ReadRepairStage   0 0  57773
 0 0
 ReplicateOnWriteStage 0 0  0
 0 0
 GossipStage   0 0 143332
 0 0
 AntiEntropyStage  0 0  0
 0 0
 MigrationStage  

Re: For those using Cassandra from .Net

2013-05-23 Thread aaron morton
Thanks, when and were is the talk ? 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/05/2013, at 6:42 AM, Peter Lin wool...@gmail.com wrote:

 
 NativeX is giving a talk about using Cassandra with .Net. Our firm created a 
 port of Hector over to .Net late last year.
 
 Here is the abstract.
 
 The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
 
 Speakers: Derek Bromenshenkel and Jeff Smoley, Infrastructure Architects at 
 NativeX
 
  
 NativeX (formerly W3i) recently transitioned a large portion of their backend 
 infrastructure from Microsoft SQL Server to Apache Cassandra. Today, its 
 Cassandra cluster backs its mobile advertising network supporting over 10 
 million daily active users that produce over 10,000 transactions per second 
 with an average database request latency of under 2 milliseconds. Come hear 
 our story about how we were successful at getting our .NET web apps to 
 reliably connect to Cassandra. Come learn about FluentCassandra, Snowflake, 
 Hector, and IKVM. It's a story of struggle and perseverance, where everyone 
 lives happily ever after.
 
 



Re: High performance disk io

2013-05-23 Thread aaron morton
  I am currently trying to really study the effect of the width of a row 
 (being in multiple sstables) vs its 95th percentile read time.
I'd be interested to see your findings. 

Is use 3+ SSTables per read as (from cfhistograms) as a warning sign to dig 
deeper in the data model. Also the type of query impacts on the number of 
SSTables per read, queries by column name can short circuit and may be served 
from (say) 0 or 1 sstables even if the row is spread out. 

 -We don’t change anything and just keep upping our keycache.
 

800MB is a very high key cache and may result in poor GC performance which is 
ultimately going to hurt your read latency. Pay attention to what GC is doing, 
both ParNew and CMS and reduce the key cache if needed. When ParNew runs the 
server is stalled. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/05/2013, at 3:16 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 I have used both rotation disks with lots of RAM as well as SSD devices. An 
 important thing to consider is that SSD devices are not magic. You have 
 big-o-notation in several places. 
 1) more data large bloom filters
 2) more data (larger key caches) JVM overhead
 3) more requests more young gen JVM overhead
 4) more data longer compaction (even with ssd)
 5) more writes (more memtable flushing)
 Bottom line: more data more disk seeks
 
 We have used both the mid level SSD as well as the costly fusion io. Fit in 
 RAM/VFScache delivers better more predictable low latency, even with very 
 fast disks the average, 95th, and 99th, percentile can get by very far apart. 
 I am currently trying to really study the effect of the width of a row (being 
 in multiple sstables) vs its 95th percentile read time.
 
 
 On Thu, May 23, 2013 at 10:43 AM, Christopher Wirt chris.w...@struq.com 
 wrote:
 Hi Igor,
 
  
 
 I was talking about 99th percentile from the Cassandra histograms when I said 
 ‘1 or 2 ms for most cf’.
 
  
 
 But we have measured client side too and generally get a couple ms added on 
 top.. as one might expect.
 
  
 
 Anyone interested -
 
 diskio (my original question) we have tried out the multiple SSD setup and 
 found it to work well and reduce the impact of a repair on node performance.
 
 We ended up going with the single data directory in cassandra.yaml and mount 
 one SSD against that. Then have a dedicated SSD per large column family.
 
 We’re now moving all of nodes to have the same setup.
 
  
 
  
 
 Chris
 
  
 
 From: Igor [mailto:i...@4friends.od.ua] 
 Sent: 23 May 2013 15:00
 To: user@cassandra.apache.org
 Subject: Re: High performance disk io
 
  
 
 Hello Christopher,
 
 BTW, are you talking about 99th percentiles on client side, or about 
 percentiles from cassandra histograms for CF on cassandra side?
 
 Thanks!
 
 On 05/22/2013 05:41 PM, Christopher Wirt wrote:
 
 Hi Igor,
 
  
 
 Yea same here, 15ms for 99th percentile is our max. Currently getting one or 
 two ms for most CF. It goes up at peak times which is what we want to avoid.
 
  
 
 We’re using Cass 1.2.4 w/vnodes and our own barebones driver on top of 
 thrift. Needed to be .NET so Hector and Astyanax were not options.
 
  
 
 Do you use SSDs or multiple SSDs in any kind of configuration or RAID?
 
  
 
 Thanks
 
  
 
 Chris
 
  
 
 From: Igor [mailto:i...@4friends.od.ua] 
 Sent: 22 May 2013 15:07
 To: user@cassandra.apache.org
 Subject: Re: High performance disk io
 
  
 
 Hello
 
 What level of read performance do you expect? We have limit 15 ms for 99 
 percentile with average read latency near 0.9ms. For some CF 99 percentile 
 actually equals to 2ms, for other - to 10ms, this depends on the data volume 
 you read in each query.
 
 Tuning read performance involved cleaning up data model, tuning 
 cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.
 
 On 05/22/2013 04:40 PM, Christopher Wirt wrote:
 
 Hello,
 
  
 
 We’re looking at deploying a new ring where we want the best possible read 
 performance.
 
  
 
 We’ve setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb 
 Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb 
 SATA for OS and commitlog
 
 Three column families
 
 ColFamily1 50% of the load and data
 
 ColFamily2 35% of the load and data
 
 ColFamily3 15% of the load and data
 
  
 
 At the moment we are still seeing around 20% disk utilisation and 
 occasionally as high as 40/50% on some nodes at peak time.. we are conducting 
 some semi live testing.
 
 CPU looks fine, memory is fine, keycache hit rate is about 80% (could be 
 better, so maybe we should be increasing the keycache size?)
 
  
 
 Anyway, we’re looking into what we can do to improve this.
 
  
 
 One conversion we are having at the moment is around the SSD disk setup..
 
  
 
 We are considering moving to have 3 smaller SSD drives and spreading the data 
 across those.
 
  
 
 The possibilities 

Re: Creating namespace and column family from multiple nodes concurrently

2013-05-23 Thread Emalayan Vairavanathan
I am sorry if I was not clear. I was using nodes to refer machines (or vice 
versa).

Let me put in another way... 

The application is composed of multiple instances of an executable. The 
application runs on multiple machines concurrently. All the instances are going 
to issue the same CQL command to and try to create exactly same namespace and 
column families.

Thank you
Emalayan



 From: Arthur Zubarev arthur.zuba...@aol.com
To: Emalayan Vairavanathan svemala...@yahoo.com; user@cassandra.apache.org 
Sent: Thursday, 23 May 2013 1:15 PM
Subject: Re: Creating namespace and column family from multiple nodes 
concurrently
 


so where the multiple nodes are? I am just puzzled  
From: Emalayan Vairavanathan 
Sent: Thursday, May 23, 2013 3:43 PM
To: Arthur Zubarev ; user@cassandra.apache.org 
Subject: Re: Creating namespace and column family from multiple 
nodes concurrently
  Would 
each device/machine have its own keyspace?
 
No. 
All the machines are going to run the exactly same CQL commands and going to 
create the same namespace and column families.
 
Thank 
you
Emalayan
 


 From: Arthur Zubarev arthur.zuba...@aol.com
To: Emalayan Vairavanathan 
svemala...@yahoo.com; user@cassandra.apache.org 
Sent: Thursday, 23 May 2013 12:20 
PM
Subject: Re: Creating 
namespace and column family from multiple nodes concurrently

 
Would each device/machine have its own keyspace?
 
Basically, your client needs to take care of a successful creation of the 
schema and any other verifications and it is going to be time consuming.  
From: Emalayan Vairavanathan 
Sent: Thursday, May 23, 2013 3:07 PM
To: user@cassandra.apache.org 
Subject: Re: Creating namespace and column family from multiple 
nodes concurrently
  Hi Arthur and Farraz,

Thank 
you for getting back to me.

I 
am trying to avoid sync among concurrent instances and thisis why I am 
preferring Option - 2. Further in my application, I have 
reasonable window between the application initialization phase and the 
application runtime.  So as long as Cassandra can safely handle concurrent 
creation I should be fine.

Do you have any idea how Cassandra is 
going to handle concurrent namespace and column family creation (Here all the 
instances are going to create the same namespace and column families 
concurrently)? 
    
- Does Cassandra take much time to agree on a final schema (In case if 
Cassandra 
is using some sort of exponential back off algorithms to handle schema 
conflicts) ? 
    
- Or is it going to result schema conflicts which needs manual intervention 
?
    
- Or will this result in race conditions ?
    
- Or some other issues e.g: memory/ cpu /network bottlenecks ?  

Thank you
Emalayan
 


 From: Arthur Zubarev arthur.zuba...@aol.com
To: user@cassandra.apache.org; 
svemala...@yahoo.com 
Sent: Wednesday, 22 May 2013 8:07 PM
Subject: Re: Creating namespace and column 
family from multiple nodes concurrently

 
I am 
assuming here you want to sync all the 100s of nodes once the application is 
airborne. I suspect this would flood the network and even potentially affect 
the 
machine itself memory-wise. How are you going to maintain the nodes 
(compaction+repair)? 
 
Regards,

Arthur


 
 
-Original 
Message-
From: Emalayan Vairavanathan svemala...@yahoo.com
To: 
user user@cassandra.apache.org
Sent: Wed, May 22, 2013 8:31 
pm
Subject: Creating namespace and column family from multiple nodes 
concurrently


Hi all,
 
I 
am implementing a distributed application which runs on 100s of machines 
concurrently. This application is going to use Cassandra as underlaying 
storage.
 
The 
application creates the schema 
(name space and column families) during initialization phase.  It seems I have 
two options 
to create the schema.

Option - 1 : 
Using a single node for schema creation.
    
Option - 2: Having all the nodes ( 100) to run the same schema creation 
logic (First, nodes will check whether the schema is already available and then 
try to create the schema if it is not available already).  
 
To 
keep the initialization phase simple, I prefer to go for Option - 2. However I 
am not sure how Cassandra is going to behave if multiple nodes try to create 
the 
same schema (namespace and column families) concurrently. It would be nice if 
someone can tell me about the implications of Option - 2 with Cassandra version 
1.2.2.

Please let me know if you have 
question.

Thank you
VE