Re: Persistently increasing read latency

Chris Goffinet Thu, 03 Dec 2009 11:05:06 -0800

Tim,

Very interesting information. Was there any other numbers in tpstatsfrom nodeprobe that are growing?

Can you plot the number of SSTables? Are you using the standardstorage-conf.xml defaults?


We've seen reads spike like this with a large number of SSTables.

-Chris

On Dec 3, 2009, at 10:58 AM, Freeman, Tim wrote:

I ran another test last night with the build dated 29 Nov 2009.Other than the Cassandra version, the setup was the same as before.I got qualitatively similar results as before, too -- the readlatency increased fairly smoothly from 250ms to 1s, the GC timesreported by jconsole are low, the pending tasks for row-mutation-stage and row-read-stage are less than 10, the pending tasks for thecompaction pool are 1615. Last time around the read latency maxedout at one second. This time, it just got to one second as I'mwriting this so I don't know yet if it will continue to increase.
I have attached a fresh graph describing the present run. It'squalitatively similar to the previous one. The vertical units aremilliseconds (for latency) and operations per minute (for reads orwrites). The horizontal scale is seconds. The feature that'sbothering me is the red line for the read latency going diagonallyfrom lower left to the lower-middle right. The scale doesn't makeit look dramatic, but Cassandra slowed down by a factor of 4.
The read and write rates were stable for 45,000 seconds or so, andthen the read latency got big enough that the application wasstarved for reads and it started writing less.
If this is worth pursuing, I suppose the next step would be for meto make a small program that reproduces the problem. It should beeasy -- we're just reading and writing random records. Let me knowif there's interest in that. I could also decide to live with a1000 ms latency here. I'm thinking of putting a cache in the localfilesystem in front of Cassandra (or whichever distributed DB wedecide to go with), so living with it is definitely possible.
Tim Freeman
Email: tim.free...@hp.com
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday,and Thursday; call my desk instead.)
-----Original Message-----
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Tuesday, December 01, 2009 11:10 AM
To: cassandra-user@incubator.apache.org
Subject: Re: Persistently increasing read latency

1) use jconsole to see what is happening to jvm / cassandra internals.
possibly you are slowly exceeding cassandra's ability to keep up with
writes, causing the jvm to spend more and more effort GCing to find
enough memory to keep going

2) you should be at least on 0.4.2 and preferably trunk if you are
stress testing

-Jonathan
On Tue, Dec 1, 2009 at 12:11 PM, Freeman, Tim <tim.free...@hp.com>wrote:
In an 8 hour test run, I've seen the read latency for Cassandradrift fairly linearly from ~460ms to ~900ms. Eventually myapplication gets starved for reads and starts misbehaving. I haveattached graphs -- horizontal scales are seconds, vertical scalesare operations per minute and average milliseconds per operation.The clearest feature is the light blue line in the left graphdrifting consistently upward during the run.
I have a Cassandra 0.4.1 database, one node, records are 100kbyteseach, 350K records, 8 threads reading, around 700 reads perminute. There are also 8 threads writing. This is all happeningon a 4 core processor that's supporting both the Cassandra node andthe code that's generating load for it. I'm reasonably sure thatthere are no page faults.
I have attached my storage-conf.xml. Briefly, it has defaultvalues, except RpcTimeoutInMillis is 30000 and the partitioner isOrderPreservingPartitioner. Cassandra's garbage collectionparameters are:
-Xms128m -Xmx1G -XX:SurvivorRatio=8 -XX:+AggressiveOpts -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
Is this normal behavior? Is there some change to the configurationI should make to get it to stop getting slower? If it's notnormal, what debugging information should I gather? Should I giveup on Cassandra 0.4.1 and move to a newer version?
I'll leave it running for the time being in case there's somethinguseful to extract from it.
Tim Freeman
Email: tim.free...@hp.com
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday,and Thursday; call my desk instead.)
<latency-trend.png>

Re: Persistently increasing read latency

Reply via email to