Just about all respones should happen sub-ms (excepting for network
jitter).

Some stuff you can check for offhand:

- List versions of all related software you're running; memcached proper,
libmemcached, ruby client)
- Your full startup arguments to memcached
- Narrow down if these timeouts happen if it's initiating a new connection
to memcached, or when reusing a persistent connection, or both (may not be
easy).
- If your memcached is (hopefully) new enough, is 'listen_disabled_num'
under the `stats` command nonzero? If so, you're hitting maxconns and
memcached is blocking new connections until old ones disconnect. Seems
unlikely for your case.

Check dmesg and syslogs on the hosts to ensure iptables isn't complaining
and TIME_WAIT buckets aren't overflowing anywhere, clients or servers.

If all software is new and blah blah blah, would you mind running a test
using a pure client (ruby or whatever, just no libmemcached) over
localhost to see if you can reproduce the issue there.

thanks,
-Dormando

On Wed, 7 Apr 2010, Ryan Tomayko wrote:

> We have a few memcached machines doing ~1000 ops/s (9:1 get to set)
> each. These are fairly beefy, non-virtualized, 8-cpu servers with ~14G
> RAM (12G to memcached). They're actually our hot fileserver spares,
> which is why the hardware is so severely overallocated. CPU is
> essentially idle, load rarely goes over 0.2 or so. We've benchmarked
> the things at 100K ops/s over the network without any real tuning or
> tweaking.
>
> Response times average under 5ms with a few hundred active
> connections. Here's a graph of the min and avg response times reported
> by memslap as the number of connections increase from 1 to 250:
>
> http://img.skitch.com/20100407-c67xj7d2b1g979bumif9wm5ebd.png
>
> But we also get occasional 200ms response times in those runs. Here's
> the max response times for the same memslap runs graphed above:
>
> http://img.skitch.com/20100407-pj9djy5k432b2225nimd9qaqcq.png
>
> That's over the network, but I get similar spikeyness in max response
> time when I run the same tests over a loopback interface.
>
> I'm wondering, are occasional high max response times like this to be 
> expected?
>
> And, if so, would a low (say 10ms) client read timeout + retry be a
> good strategy for combatting them?
>
> We use libmemcached (via the memcached Ruby library) with a few
> hundred persistent connections to each memcached and are experimenting
> with different approaches for setting the receive timeout
> (MEMCACHED_BEHAVIOR_RCV_TIMEOUT). We started with 250ms because that
> seemed like a basically sane value, but the high rate of timeouts (and
> eventual host ejections) caused us to bump that up to 500ms, and then
> 1s, until we settled on 1.5s. That will still timeout occassionally,
> but the frequency is much reduced -- more what I had expected with a
> ~250ms timeout.
>
> 1.5s seems like an insanely high value. I figure we either have some
> kind of server configuration issue or we need to consider using the
> read timeout to guarantee consistent response times. I'm researching
> the former but was hoping someone on the list might have experience
> with the latter. Or even general advice on using client send/receive
> timeouts?
>
> Thanks,
> Ryan
>
>
> --
> To unsubscribe, reply using "remove me" as the subject.
>

Reply via email to