I'm running 0.4.1.  I used to get timeouts, then I changed my timeout from 5 
seconds to 30 seconds and I get no more timeouts.  The relevant line from 
storage-conf.xml is:

  <RpcTimeoutInMillis>30000</RpcTimeoutInMillis>

The maximum latency is often just over 5 seconds in the worst case when I fetch 
thousands of records, so default timeout of 5 seconds happens to be a little 
bit too low for me.  My records are ~100Kbytes each.  You may get different 
results if your records are much larger or much smaller.

The other issue I was having a few days ago was that the machine was page 
faulting so garbage collections were taking forever.  Some GC's took 20 minutes 
in another Java process.  I didn't have verbose:gc turned on in Cassandra so 
I'm not sure what the score was there, but there's little reason to expect it 
to be qualitatively better, since it's pretty random which process gets some of 
its pages swapped out.  On a Linux machine, run "vmstat 5" when your machine is 
loaded and if you see numbers greater than 0 in the "si" and "so" columns in 
rows after the first, tell one of your Java processes to take less memory.

Tim Freeman
Email: tim.free...@hp.com<mailto:tim.free...@hp.com>
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and 
Thursday; call my desk instead.)

From: Chris Were [mailto:chris.w...@gmail.com]
Sent: Monday, November 16, 2009 9:47 AM
To: Jonathan Ellis
Cc: cassandra-user@incubator.apache.org
Subject: Re: Timeout Exception

I turned on debug logging for a few days and timeouts happened across pretty 
much all requests. I couldn't see any particular request that was consistently 
the problem.

After some experimenting it seems that shutting down cassandra and restarting 
resolves the problem. Once it hits the JVM memory limit however, the timeouts 
start again. I have read the page on MemTable thresholds and have tried 
thresholds of 32MB, 64MB and 128MB with no noticeable difference. Cassandra is 
set to use 7GB of memory. I have 12 CF's, however only 6 of those have lots of 
data.

Cheers,
Chris
On Tue, Nov 10, 2009 at 11:55 AM, Jonathan Ellis 
<jbel...@gmail.com<mailto:jbel...@gmail.com>> wrote:
if you're timing out doing a slice on 10 columns w/ 10% cpu used,
something is broken

is it consistent as to which keys this happens on?  try turning on
debug logging and seeing where the latency is coming from.

On Tue, Nov 10, 2009 at 1:53 PM, Chris Were 
<chris.w...@gmail.com<mailto:chris.w...@gmail.com>> wrote:
>
> On Tue, Nov 10, 2009 at 11:50 AM, Jonathan Ellis 
> <jbel...@gmail.com<mailto:jbel...@gmail.com>> wrote:
>>
>> On Tue, Nov 10, 2009 at 1:49 PM, Chris Were 
>> <chris.w...@gmail.com<mailto:chris.w...@gmail.com>> wrote:
>> > Maybe... but it's not just multigets, it also happens when retreiving
>> > one
>> > row with get_slice.
>>
>> how many of the 3M columns are you trying to slice at once?
>
> Sorry, I must have mixed up the terminology.
> There's ~3M keys, but less than 10 columns in each. The get_slice calls are
> to retreive all the columns (10) for a given key.

Reply via email to