Re: Knowing when there is a *real* need to add nodes
Considering disk usage is a tricky one. Compacted SSTables files will remain on disk until either there is not enough space, or the JVM GC runs. To measure the live space use the "Space used (live)" from the CFStats. "Space used (total)" includes the space which has been compacted and not yet deleted from disk. The data in deleted columns *may* be purged from disk during a minor or major compaction. This can happen before GCGraceSeconds has expired. It is only the Tombstone that must be kept around for at least GCGraceSeconds. I agree that 50% utilisation on the data directories is a sensible soft limit that will help keep you out of trouble. The space needed by the compaction depends on which bucket of files it is compacting, but it will always require at least as much free disk space as the files it is compacting. That should also leave headroom for adding new nodes, just in case. Ideally when adding new nodes existing nodes only stream data to the new nodes. If however you are increasing the node count by less than a factor of 2 you may need to make multiple moves and the nodes may need additional space. To gauge the throughout I would also look at the Latency trackers on the o.a.c.db.StorageProxy MBean. They track the latency of complete requests including talking to the rest of the cluster. The metrics on the individual column families are concerned with the local read. For the pending TP stats I would guess that for the read and write pools a pending value consistently higher than the number of threads assigned (in the config) would be something to investigate. Waiting on these stages will be reflected in the StorageProxy latency numbers. HintedHandoff, StreamStage and AntiEntropyStage will have tasks that staying the pending queue for a while. AFAIK the other pools should not have many (< 10) tasks in the pending queue and should be able to clearing the pending queue. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 18 May 2011, at 19:50, Tomer B wrote: > As for static disk usage i would add this: > > test: df -kh > description: run test after compaction (check GCGraceSeconds in > storage-conf.xml) as only then data is expunged permanently, run on data > disk, assuming here commitlog disk is separated from data dir. > green gauge: used_space < 30% of disk capacity > yellow gauge: used space 30% - 50% of disk capacity > red gauge: used_space > 50% of disk capacity > comments: Compactions can require up to 100% of in use space temporarily in > worst case (data file dir) when approaching 50% or more of disk capacity use > raid0 for data dir disk if cannot try increasing your disk if cannot consider > adding nodes (or first consider adding nodes if that's what you wish). > > 2011/5/12 Watanabe Maki > It's interesting topic for me too. > How about to add measurement on static disk utilization (% used) and memory > utilization ( rss, JVM heap, JVM GC )? > > maki > > From iPhone > > > On 2011/05/12, at 0:49, Tomer B wrote: > > > Hi > > > > I'm trying to predict when my cluster would soon be needing new nodes > > added, i want a continuous graph telling my of my cluster health so > > that when i see my cluster becomes more and more busy (I want numbers > > & measurments) i would be able to know i need to start purchasing more > > machines and get them into my cluster, so i want to know of that > > beforehand. > > I'm writing here what I came with after doing some research over net. > > I would highly appreciate any additional gauge measurements and ranges > > in order to test my cluster health and to know beforehand when i'm > > going to soon need more nodes.Although i'm writing down green > > gauge,yellow gauge,red gauge, i'm also trying to find a continuous > > graph where i can tell where our cluster stand (as much as > > possible...) > > > > Also my recommendation is always before adding new nodes: > > > > 1. Make sure all nodes are balanced and if not balance them. > > 2. Separate commit log drive from data (SSTables) drive > > 3. use mmap index only in memory and not auto > > 4. Increase disk IO if possible. > > 5. Avoid swapping as much as possible. > > > > > > As for my gauge tests for when to add new nodes: > > > > test: nodetool tpstats -h > > green gauge: No pending column with number higher > > yellow gauge: pending columns 100-2000 > > red gauge:Larger than 3000 > > > > test: iostat -x -n -p -z 5 10 and iostat -xcn 5 > > green gauge: kw/s + kr/s reaches is below 25% capacity of disk io > > yellow gauge: 20%-50% > > red gauge: 50%+ > > > > test: ostat -x -n -p -z 5 10 and check %b column > > green gauge: less than 10% > > yellow gauge: 10%-80% > > red gauge: 90%+ > > > > test: nodetool cfstats --host localhost > > green gauge: “SSTable count” item does not continually grow over time > > yellow gauge: > > red gauge: “SSTable count” item continually grows over time
Re: Knowing when there is a *real* need to add nodes
As for static disk usage i would add this: test: df -kh description: run test after compaction (check GCGraceSeconds in storage-conf.xml) as only then data is expunged permanently, run on data disk, assuming here commitlog disk is separated from data dir. green gauge: used_space < 30% of disk capacity yellow gauge: used space 30% - 50% of disk capacity red gauge: used_space > 50% of disk capacity comments: Compactions can require up to 100% of in use space temporarily in worst case (data file dir) when approaching 50% or more of disk capacity use raid0 for data dir disk if cannot try increasing your disk if cannot consider adding nodes (or first consider adding nodes if that's what you wish). 2011/5/12 Watanabe Maki > It's interesting topic for me too. > How about to add measurement on static disk utilization (% used) and memory > utilization ( rss, JVM heap, JVM GC )? > > maki > > From iPhone > > > On 2011/05/12, at 0:49, Tomer B wrote: > > > Hi > > > > I'm trying to predict when my cluster would soon be needing new nodes > > added, i want a continuous graph telling my of my cluster health so > > that when i see my cluster becomes more and more busy (I want numbers > > & measurments) i would be able to know i need to start purchasing more > > machines and get them into my cluster, so i want to know of that > > beforehand. > > I'm writing here what I came with after doing some research over net. > > I would highly appreciate any additional gauge measurements and ranges > > in order to test my cluster health and to know beforehand when i'm > > going to soon need more nodes.Although i'm writing down green > > gauge,yellow gauge,red gauge, i'm also trying to find a continuous > > graph where i can tell where our cluster stand (as much as > > possible...) > > > > Also my recommendation is always before adding new nodes: > > > > 1. Make sure all nodes are balanced and if not balance them. > > 2. Separate commit log drive from data (SSTables) drive > > 3. use mmap index only in memory and not auto > > 4. Increase disk IO if possible. > > 5. Avoid swapping as much as possible. > > > > > > As for my gauge tests for when to add new nodes: > > > > test: nodetool tpstats -h > > green gauge: No pending column with number higher > > yellow gauge: pending columns 100-2000 > > red gauge:Larger than 3000 > > > > test: iostat -x -n -p -z 5 10 and iostat -xcn 5 > > green gauge: kw/s + kr/s reaches is below 25% capacity of disk io > > yellow gauge: 20%-50% > > red gauge: 50%+ > > > > test: ostat -x -n -p -z 5 10 and check %b column > > green gauge: less than 10% > > yellow gauge: 10%-80% > > red gauge: 90%+ > > > > test: nodetool cfstats --host localhost > > green gauge: “SSTable count” item does not continually grow over time > > yellow gauge: > > red gauge: “SSTable count” item continually grows over time > > > > test: ./nodetool cfstats --host localhost | grep -i pending > > green gauge: 0-2 > > yellow gauge: 3-100 > > red gauge: 101+ > > > > I would highly appreciate any additional gauge measurements and ranges > > in order to test my cluster health and to know ***beforehand*** when > > i'm going to soon need more nodes. >
Re: Knowing when there is a *real* need to add nodes
It's interesting topic for me too. How about to add measurement on static disk utilization (% used) and memory utilization ( rss, JVM heap, JVM GC )? maki From iPhone On 2011/05/12, at 0:49, Tomer B wrote: > Hi > > I'm trying to predict when my cluster would soon be needing new nodes > added, i want a continuous graph telling my of my cluster health so > that when i see my cluster becomes more and more busy (I want numbers > & measurments) i would be able to know i need to start purchasing more > machines and get them into my cluster, so i want to know of that > beforehand. > I'm writing here what I came with after doing some research over net. > I would highly appreciate any additional gauge measurements and ranges > in order to test my cluster health and to know beforehand when i'm > going to soon need more nodes.Although i'm writing down green > gauge,yellow gauge,red gauge, i'm also trying to find a continuous > graph where i can tell where our cluster stand (as much as > possible...) > > Also my recommendation is always before adding new nodes: > > 1. Make sure all nodes are balanced and if not balance them. > 2. Separate commit log drive from data (SSTables) drive > 3. use mmap index only in memory and not auto > 4. Increase disk IO if possible. > 5. Avoid swapping as much as possible. > > > As for my gauge tests for when to add new nodes: > > test: nodetool tpstats -h > green gauge: No pending column with number higher > yellow gauge: pending columns 100-2000 > red gauge:Larger than 3000 > > test: iostat -x -n -p -z 5 10 and iostat -xcn 5 > green gauge: kw/s + kr/s reaches is below 25% capacity of disk io > yellow gauge: 20%-50% > red gauge: 50%+ > > test: ostat -x -n -p -z 5 10 and check %b column > green gauge: less than 10% > yellow gauge: 10%-80% > red gauge: 90%+ > > test: nodetool cfstats --host localhost > green gauge: “SSTable count” item does not continually grow over time > yellow gauge: > red gauge: “SSTable count” item continually grows over time > > test: ./nodetool cfstats --host localhost | grep -i pending > green gauge: 0-2 > yellow gauge: 3-100 > red gauge: 101+ > > I would highly appreciate any additional gauge measurements and ranges > in order to test my cluster health and to know ***beforehand*** when > i'm going to soon need more nodes.
Knowing when there is a *real* need to add nodes
Hi I'm trying to predict when my cluster would soon be needing new nodes added, i want a continuous graph telling my of my cluster health so that when i see my cluster becomes more and more busy (I want numbers & measurments) i would be able to know i need to start purchasing more machines and get them into my cluster, so i want to know of that beforehand. I'm writing here what I came with after doing some research over net. I would highly appreciate any additional gauge measurements and ranges in order to test my cluster health and to know beforehand when i'm going to soon need more nodes.Although i'm writing down green gauge,yellow gauge,red gauge, i'm also trying to find a continuous graph where i can tell where our cluster stand (as much as possible...) Also my recommendation is always before adding new nodes: 1. Make sure all nodes are balanced and if not balance them. 2. Separate commit log drive from data (SSTables) drive 3. use mmap index only in memory and not auto 4. Increase disk IO if possible. 5. Avoid swapping as much as possible. As for my gauge tests for when to add new nodes: test: nodetool tpstats -h green gauge: No pending column with number higher yellow gauge: pending columns 100-2000 red gauge:Larger than 3000 test: iostat -x -n -p -z 5 10 and iostat -xcn 5 green gauge: kw/s + kr/s reaches is below 25% capacity of disk io yellow gauge: 20%-50% red gauge: 50%+ test: ostat -x -n -p -z 5 10 and check %b column green gauge: less than 10% yellow gauge: 10%-80% red gauge: 90%+ test: nodetool cfstats --host localhost green gauge: “SSTable count” item does not continually grow over time yellow gauge: red gauge: “SSTable count” item continually grows over time test: ./nodetool cfstats --host localhost | grep -i pending green gauge: 0-2 yellow gauge: 3-100 red gauge: 101+ I would highly appreciate any additional gauge measurements and ranges in order to test my cluster health and to know ***beforehand*** when i'm going to soon need more nodes.