Re: do I need to add more nodes? minor compaction eat all IO

2011-07-26 Thread Jim Ancona
On Mon, Jul 25, 2011 at 6:41 PM, aaron morton aa...@thelastpickle.com wrote:
 There are no hard and fast rules to add new nodes, but here are two 
 guidelines:

 1) Single node load is getting too high, rule of thumb is 300GB is probably 
 too high.

What is that rule of thumb based on? I would guess that working set
size would matter more than absolute size. Why isn't that the case?

Jim


Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread Yan Chunlu
I am using normal SATA disk,  actually I was worrying about whether it
is okay if every time cassandra using all the io resources?
further more when is the good time to add more nodes when I was just
using normal SATA disk and with 100r/s it could reach 100 %util

how large the data size it should be on each node?


below is my iostat -x 2 when doing node repair, I have to repair
column family separately otherwise the load will be more crazy:

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   1.50 1.50  121.50   14.00 3.68 0.30
60.19   116.98 1569.46   59.49 14673.86   7.38 100.00






On Sun, Jul 24, 2011 at 8:04 AM, Jonathan Ellis jbel...@gmail.com wrote:
 On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard frich...@xobni.com wrote:
 My understanding is that during compaction cassandra does a lot of non 
 sequential readsa then dumps the results with a big sequential write.

 Compaction reads and writes are both sequential, and 0.8 allows
 setting a MB/s to cap compaction at.

 As to the original question do I need to add more machines I'd say
 that depends more on whether your application's SLA is met, than what
 % io util spikes to.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread Yan Chunlu
as the wiki suggested:
http://wiki.apache.org/cassandra/LargeDataSetConsiderations
Adding nodes is a slow process if each node is responsible for a large
amount of data. Plan for this; do not try to throw additional hardware
at a cluster at the last minute.


I really would like to know what's the status of my cluster, if it is normal


On Mon, Jul 25, 2011 at 8:59 PM, Yan Chunlu springri...@gmail.com wrote:
 I am using normal SATA disk,  actually I was worrying about whether it
 is okay if every time cassandra using all the io resources?
 further more when is the good time to add more nodes when I was just
 using normal SATA disk and with 100r/s it could reach 100 %util

 how large the data size it should be on each node?


 below is my iostat -x 2 when doing node repair, I have to repair
 column family separately otherwise the load will be more crazy:

 Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
 sda               1.50     1.50  121.50   14.00     3.68     0.30
 60.19   116.98 1569.46   59.49 14673.86   7.38 100.00






 On Sun, Jul 24, 2011 at 8:04 AM, Jonathan Ellis jbel...@gmail.com wrote:
 On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard frich...@xobni.com wrote:
 My understanding is that during compaction cassandra does a lot of non 
 sequential readsa then dumps the results with a big sequential write.

 Compaction reads and writes are both sequential, and 0.8 allows
 setting a MB/s to cap compaction at.

 As to the original question do I need to add more machines I'd say
 that depends more on whether your application's SLA is met, than what
 % io util spikes to.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread aaron morton
There are no hard and fast rules to add new nodes, but here are two guidelines:

1) Single node load is getting too high, rule of thumb is 300GB is probably too 
high. 
2) There are times when the cluster cannot keep up with throughout, for example 
the client is getting TimedOutExceptions or TPStats is showing consistently 
high (a multiple of the available threads) read or write pending queues. 

What works for you will be what keeps your site running and keeps the ops/dev 
team sleeping at night.   

In your case, high IO during repair maybe OK if the cluster can keep up with 
demands. Or it may mean you need to upgrade the IO capacity or add nodes. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26 Jul 2011, at 01:17, Yan Chunlu wrote:

 as the wiki suggested:
 http://wiki.apache.org/cassandra/LargeDataSetConsiderations
 Adding nodes is a slow process if each node is responsible for a large
 amount of data. Plan for this; do not try to throw additional hardware
 at a cluster at the last minute.
 
 
 I really would like to know what's the status of my cluster, if it is normal
 
 
 On Mon, Jul 25, 2011 at 8:59 PM, Yan Chunlu springri...@gmail.com wrote:
 I am using normal SATA disk,  actually I was worrying about whether it
 is okay if every time cassandra using all the io resources?
 further more when is the good time to add more nodes when I was just
 using normal SATA disk and with 100r/s it could reach 100 %util
 
 how large the data size it should be on each node?
 
 
 below is my iostat -x 2 when doing node repair, I have to repair
 column family separately otherwise the load will be more crazy:
 
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
 sda   1.50 1.50  121.50   14.00 3.68 0.30
 60.19   116.98 1569.46   59.49 14673.86   7.38 100.00
 
 
 
 
 
 
 On Sun, Jul 24, 2011 at 8:04 AM, Jonathan Ellis jbel...@gmail.com wrote:
 On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard frich...@xobni.com 
 wrote:
 My understanding is that during compaction cassandra does a lot of non 
 sequential readsa then dumps the results with a big sequential write.
 
 Compaction reads and writes are both sequential, and 0.8 allows
 setting a MB/s to cap compaction at.
 
 As to the original question do I need to add more machines I'd say
 that depends more on whether your application's SLA is met, than what
 % io util spikes to.
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 



RE: do I need to add more nodes? minor compaction eat all IO

2011-07-24 Thread Francois Richard
Jonathan,

Are you sure that the reads done for compaction are sequential with Cassandra 
0.6.13?  This is not what I am observing right now.  During a minor compaction 
I usually observe ~ 1500 to 1900 r/s while rMB/s is barely around 30 to 35MB/s.

Just asking out of curiosity.


FR

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Saturday, July 23, 2011 5:05 PM
To: user@cassandra.apache.org
Subject: Re: do I need to add more nodes? minor compaction eat all IO

On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard frich...@xobni.com wrote:
 My understanding is that during compaction cassandra does a lot of non 
 sequential readsa then dumps the results with a big sequential write.

Compaction reads and writes are both sequential, and 0.8 allows setting a MB/s 
to cap compaction at.

As to the original question do I need to add more machines I'd say that 
depends more on whether your application's SLA is met, than what % io util 
spikes to.

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support 
http://www.datastax.com


Re: do I need to add more nodes? minor compaction eat all IO

2011-07-24 Thread Jonathan Ellis
It's sequential per-sstable.  If you are compacting a lot of sstables
how closely this approximates completely sequential will
deteriorate.

On Sun, Jul 24, 2011 at 1:18 PM, Francois Richard frich...@xobni.com wrote:
 Jonathan,

 Are you sure that the reads done for compaction are sequential with Cassandra 
 0.6.13?  This is not what I am observing right now.  During a minor 
 compaction I usually observe ~ 1500 to 1900 r/s while rMB/s is barely around 
 30 to 35MB/s.

 Just asking out of curiosity.


 FR

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Saturday, July 23, 2011 5:05 PM
 To: user@cassandra.apache.org
 Subject: Re: do I need to add more nodes? minor compaction eat all IO

 On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard frich...@xobni.com wrote:
 My understanding is that during compaction cassandra does a lot of non 
 sequential readsa then dumps the results with a big sequential write.

 Compaction reads and writes are both sequential, and 0.8 allows setting a 
 MB/s to cap compaction at.

 As to the original question do I need to add more machines I'd say that 
 depends more on whether your application's SLA is met, than what % io util 
 spikes to.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support 
 http://www.datastax.com




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


RE: do I need to add more nodes? minor compaction eat all IO

2011-07-23 Thread Francois Richard
This really depends on your disks setup.

When you run iostat under high load, do you see a high number of r/s but the 
rMB/s is not so great?

I usually use:

iostat -x -m sdb sdc 1 to monitor situation like this.


In my case my disk setup is the following:

OS -- /sda
Cassandra CommitLogs -- /sdb
Cassandra Data -- /sdc

My understanding is that during compaction cassandra does a lot of non 
sequential readsa then dumps the results with a big sequential write.

Is your application mostly doing writes and little reads or the other way 
around.


FR

-Original Message-
From: Yan Chunlu [mailto:springri...@gmail.com] 
Sent: Saturday, July 23, 2011 9:16 AM
To: cassandra-u...@incubator.apache.org
Subject: do I need to add more nodes? minor compaction eat all IO

I have three nodes and RF=3, every time it is do minor compaction, the cpu 
load(8 core) get to 30, and iostat -x 2 shows utils is 100%, is that means I 
need more nodes?  the total data size is 60G

thanks!

--


Re: do I need to add more nodes? minor compaction eat all IO

2011-07-23 Thread Jonathan Ellis
On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard frich...@xobni.com wrote:
 My understanding is that during compaction cassandra does a lot of non 
 sequential readsa then dumps the results with a big sequential write.

Compaction reads and writes are both sequential, and 0.8 allows
setting a MB/s to cap compaction at.

As to the original question do I need to add more machines I'd say
that depends more on whether your application's SLA is met, than what
% io util spikes to.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com