Re: Memcached crashing under load

2009-03-16 Thread Henrik Schröder
On Fri, Mar 13, 2009 at 02:19, meppum mmep...@gmail.com wrote:


 I was load testing memcached and have been experiencing consistent
 crashing when approaching 11k total connections (not concurrent).


Load testing of course has its uses, but is your scenario even remotely
likely to happen in your live system? What your scripts is actually testing
is how fast you can recycle sockets, but if you use a client that supports
connection pooling, this is never going to become an issue.

Here's some stats from one of our cache servers:

uptime 449124
time   1237196926
version1.2.5
curr_items 168078
curr_connections   24
total_connections  334
connection_structures  38
cmd_get642209176
cmd_set6052589
get_hits   499909822
get_misses 142299354

It's getting 1400 gets per second in average, many of those are multi-gets,
so the actual amount of requests per second is maybe a tenth of that, say
that it gets somewhere between 100-150 req/s, but because all clients use
connection pooling, the number of concurrent connections is only 24, and in
the five days it's been up, it's been going through a total of 334
connections, and those are because the connection-pools vary in size
according to load, so there's some recycling going on. I cannot imagine what
kind of load we would need to put on our systems to run into the problem
your are testing for, but I'm sure that we'd encounter many many other
problems before we reach that point.

You didn't say much about your application, but doesn't the Python client
support connection pooling? Why don't you make sure you use that, instead of
lowering some TCP timeout values on your servers so that they can recycle
sockets faster? It seems to me that you are looking for a complex solution
to a non-problem.


/Henrik


Re: Memcached crashing under load

2009-03-15 Thread Chris Goffinet


As a simple test, try increasing the backlog int from the default  
(1024) of memcached higher.


http://linux.die.net/man/2/listen

The backlog parameter defines the maximum length the queue of  
pending connections may grow to. If a connection request arrives  
with the queue full the client may receive an error with an  
indication of ECONNREFUSED or, if the underlying protocol supports  
retransmission, the request may be ignored so that retries succeed.



--
Chris Goffinet
MyBlogLog Senior Performance Engineer

Yahoo!
San Francisco, CA
United States

On Mar 15, 2009, at 5:51 PM, meppum wrote:



I've created a python script that should trigger this error on any
machine running python-memcached or cmemcached. Hopefully this will
clear things up. It looks like neither the client or the daemon
crash,
but that the daemon cannot respond in time so an error is thrown. I
find this odd because the curr_connections stat never exceeds the
number of possible connections. If anyone can shed some light on this
i'd appreciate it.

This is the how i started memcached:
memcached  -m 8 -c 1024 -v -l 127.0.0.1 -d

The script:
try:
   import cmemcache as memcache
except ImportError:
   import memcache
c = memcache.Client([127.0.0.1:11211])
c.set('abc', '123')
c.disconnect_all()
for i in range(2):
   if i % 1000 == 0:
   print iteration: %s % i
   c = memcache.Client([127.0.0.1:11211])
   c.get('abc')
   c.disconnect_all()

On Mar 13, 7:32 am, meppum mmep...@gmail.com wrote:

That's a good question. To be clear it's not a hard crash, it's just
that one or the other starts throwing errors regarding connection
timeouts when there should be more than enough connections available.
How can I tell if it's the client or server throwing the errors?

On Mar 12, 9:33 pm, Dustin dsalli...@gmail.com wrote:

  For clarity -- are you saying the server is crashing, or the  
client?



On Mar 12, 6:19 pm, meppum mmep...@gmail.com wrote:



I was load testing memcached and have been experiencing consistent
crashing when approaching 11k total connections (not concurrent).
Below is a sample of some python code I have developed to isolate  
this
problem as well as my setup and the error I get. I searched  
google and

couldn't seem to find an answer.



--



Python Code:



import cmemcache



c = cmemcache.Client([127.0.0.1:11211])
c.set('abc', '123')
c.disconnect_all()



for i in range(2):
c = cmemcache.Client([127.0.0.1:11211])
c.get('abc')
c.disconnect_all()



-



Error:



[w...@1236906533.149172] mcm_server_connect_next_avail():2338
[not...@1236906533.149172] mcm_server_connect_next_avail():2328
[w...@1236906537.889442] mcm_server_writable():3178: timeout:
Operation now in progress: write select(2) call timed out
[w...@1236906537.889442] mcm_server_connect():2295: select(2)  
failed:

Operation now in progress: select(2) timed out on establishing
connection
connect(): -1
[not...@1236906537.889442] mcm_server_connect():2302: Operation
already in progress
[not...@1236906537.889442] mcm_server_connect_next_avail():2333:
Operation already in progress
[w...@1236906537.889442] mcm_server_connect_next_avail():2338
[not...@1236906537.889442] mcm_server_connect_next_avail():2328



-



Setup:



-Ubuntu Intrepid
-Libmemcached 1.4.0.rc2-1
-Cmemcache 0.95
-Memcached 1.2.6
-Python 2.5



-meppum




Re: Memcached crashing under load

2009-03-15 Thread David Stanek

On Sun, Mar 15, 2009 at 8:49 PM, meppum mmep...@gmail.com wrote:

 The script:
 try:
        import cmemcache as memcache
 except ImportError:
        import memcache

 c = memcache.Client([127.0.0.1:11211])
 c.set('abc', '123')
 c.disconnect_all()

 for i in range(2):
        if i % 1000 == 0:
                print iteration: %s % i

        c = memcache.Client([127.0.0.1:11211])
        c.get('abc')
        c.disconnect_all()


This script will not run multiple memcached requests in parallel. Is
that what you were going for?

-- 
David
blog: http://www.traceback.org
twitter: http://twitter.com/dstanek


Re: Memcached crashing under load

2009-03-15 Thread Chris Goffinet


When I run your script + daemon, I noticed some timeouts. I adjusted  
these sysctl and managed to keep running the script over and over, and  
could not see timeouts:


sudo /sbin/sysctl -w net.ipv4.tcp_tw_recycle=1
sudo /sbin/sysctl -w net.ipv4.tcp_tw_reuse=1
sudo /sbin/sysctl -w net.ipv4.tcp_fin_timeout=10

On your system, what happens when you do the same? Do you see any  
improvement?


--
Chris Goffinet
MyBlogLog Senior Performance Engineer

Yahoo!
San Francisco, CA
United States

On Mar 15, 2009, at 6:39 PM, meppum wrote:



Yes, i went for the simplest script that caused the error on my
machine.

On Mar 15, 9:29 pm, David Stanek dsta...@dstanek.com wrote:

On Sun, Mar 15, 2009 at 8:49 PM, meppum mmep...@gmail.com wrote:


The script:
try:
   import cmemcache as memcache
except ImportError:
   import memcache



c = memcache.Client([127.0.0.1:11211])
c.set('abc', '123')
c.disconnect_all()



for i in range(2):
   if i % 1000 == 0:
   print iteration: %s % i



   c = memcache.Client([127.0.0.1:11211])
   c.get('abc')
   c.disconnect_all()


This script will not run multiple memcached requests in parallel. Is
that what you were going for?

--
David
blog:http://www.traceback.org
twitter:http://twitter.com/dstanek




Re: Memcached crashing under load

2009-03-12 Thread Dustin


  For clarity -- are you saying the server is crashing, or the client?

On Mar 12, 6:19 pm, meppum mmep...@gmail.com wrote:
 I was load testing memcached and have been experiencing consistent
 crashing when approaching 11k total connections (not concurrent).
 Below is a sample of some python code I have developed to isolate this
 problem as well as my setup and the error I get. I searched google and
 couldn't seem to find an answer.

 --

 Python Code:

 import cmemcache

 c = cmemcache.Client([127.0.0.1:11211])
 c.set('abc', '123')
 c.disconnect_all()

 for i in range(2):
         c = cmemcache.Client([127.0.0.1:11211])
         c.get('abc')
         c.disconnect_all()

 -

 Error:

 [w...@1236906533.149172] mcm_server_connect_next_avail():2338
 [not...@1236906533.149172] mcm_server_connect_next_avail():2328
 [w...@1236906537.889442] mcm_server_writable():3178: timeout:
 Operation now in progress: write select(2) call timed out
 [w...@1236906537.889442] mcm_server_connect():2295: select(2) failed:
 Operation now in progress: select(2) timed out on establishing
 connection
 connect(): -1
 [not...@1236906537.889442] mcm_server_connect():2302: Operation
 already in progress
 [not...@1236906537.889442] mcm_server_connect_next_avail():2333:
 Operation already in progress
 [w...@1236906537.889442] mcm_server_connect_next_avail():2338
 [not...@1236906537.889442] mcm_server_connect_next_avail():2328

 -

 Setup:

 -Ubuntu Intrepid
 -Libmemcached 1.4.0.rc2-1
 -Cmemcache 0.95
 -Memcached 1.2.6
 -Python 2.5

 -meppum