I've been lately running into some limitation using memcache and ec2. Given that this is a EC2 issue i might get flamed but I was hoping to get opinion from people here that have been using memcache. We currently run a cluster of aproximately 40 memcache servers with about 6.5 gb of ram each machine using m1.medium ec2 instances. I was in the process of reducing the number of servers while increasing the memory size for each from 6 to about 30gb. Now i've started noticing that some servers seem to hit certain bandwidth limitations not consistenly though since i have some servers pushing 6mb/sec and some at 4mb having packet los and tcp timeouts. Most of my issues are the php client complaining about Connection timed out (110) or failed with: Failed reading line from stream (0) this is sporadic at best but usually concentrated on the same servers, I've replaced the instances hoping this will give me an instance on a better area or on a less congested switch but i still have the issue on the same server. I've tried multiple kernel settings hoping to aleviate this but im not sure that either this apply to the machine without restarting the memcache daemon. I guess bottom line is. if i add say another 30 servers eficiently splitting network traffic across more machines. will this slow down a lot of the key hashing part of memcache as well as the initialization of the client when we loop to add servers.. I like to note i use pecl memcache client.
T