Or you could disable the "failover" feature... On Tue, 6 Jul 2010, Darryl Kuhn wrote:
> FYI - we made the change on one server and it does appear to have resolved > premature key expiration. > > Effectively what appears to have been happening was that every so often a > client was unable to connect to one or more of the memcached servers. When > this happened it changed the key distribution. Because > the connection was persistent it meant that subsequent requests would use the > same connection handle with the reduced server pool. Turning off persistent > connections ensures that a if we are unable to > connect to a server in one instance the failure does not persist for > subsequent connections. > > We'll be rolling this change out to the entire server pool and I'll give the > list another update with our findings. > > Thanks, > Darryl > > On Fri, Jul 2, 2010 at 8:34 AM, Darryl Kuhn <darryl.k...@gmail.com> wrote: > Found the reset call - that was me being an idiot (I actually > introduced it when I added logging to debug this issue)... That's been > removed however there was no flush command. Somebody else > suggested it may have to do with the fact that we're running persistent > connections; and that if a failure occurred that failure would persist and > alter hashing rules for subsequent requests on > that connection. I do see a limited number of connection failures > (~5-15) throughout the day. I'm going to alter the config to make connections > non-persistent and see if it makes a difference > (however I'm doubtful this is the issue as we've run with memcache > server pools with a single instance - which would make it impossible to alter > the hashing distribution). > > I'll report back what I find - thanks for your continued input! > > -Darryl > > > On Thu, Jul 1, 2010 at 12:28 PM, dormando <dorma...@rydia.net> wrote: > > Dormando... Thanks for the response. I've moved one of our servers to > use an upgraded version running 1.4.5. Couple of things: > > * I turned on logging last night > > * I'm only running -vv at the moment; -vvv generated way more > logging than we could handle. As it stands we've generated ~6GB of logs since > last night (using -vv). I'm looking at ways > of reducing log > > volume by logging only specific data or perhaps standing up 10 or > 20 instances on one machine (using multiple ports) and turning on -vvv on > only one instance. Any suggestions there? > > Oh. I thought given your stats output that you had reproduced it on a > server that was on a dev instance or local machine... but I guess that's > related to below. Running logs on a production instance with a lot of > traffic isn't that great of an idea, sorry about that :/ > > > Looking at the logs two things jump out at me. > > * While I had -vvv turned on I saw "stats reset" command being issued > > constantly (at least once a second). Nothing in the code that we have does > > this - do you know if the PHP client does > this perhaps? Is > > this something you've seen in the past? > > No, you probably have some code that's doing something intensely wrong. > Now we should probably add a counter for the number of times a "stats > reset" has been called... > > > * Second with -vv on I get something like this: > > + <71 get resourceCategoryPath21:984097: > > >71 sending key resourceCategoryPath21:984097: > > >71 END > > <71 set > > popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0__type > > 0 86400 5 > > >71 STORED > > <71 set > > popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0 1 > > 86400 130230 > > <59 get domain_host:www.bestbuyskins.com > > >59 sending key domain_host:www.bestbuyskins.com > > >59 END > > * Two questions on the output - what's the "71" and "59"? Second - I > > would have thought I'd see an "END" after each "get" and "set" however you > > can see that's not the case. > > > > Last question... other than trolling through code is there a good place to > > go to understand how to parse out these log files (I'd prefer to self-help > > rather than bugging you)? > > Looks ike you figured that out. The numbers are the file descriptors > (connections). END/STORED/etc are the responses. > > Honestly I'm going to take a wild guess that something on your end is > constantly trying to reset the memcached instance.. it's probably doing a > "flush_all" then a "stats reset" which would hide the flush counter. Do > you see "flush_all" being called in the logs anywhere? > > Go find where you're calling stats reset and make it stop... that'll > probably help bubble up what the real problem is. > > > > >