Or you could disable the "failover" feature...

On Tue, 6 Jul 2010, Darryl Kuhn wrote:

> FYI - we made the change on one server and it does appear to have resolved 
> premature key expiration.
>
> Effectively what appears to have been happening was that every so often a 
> client was unable to connect to one or more of the memcached servers. When 
> this happened it changed the key distribution. Because
> the connection was persistent it meant that subsequent requests would use the 
> same connection handle with the reduced server pool. Turning off persistent 
> connections ensures that a if we are unable to
> connect to a server in one instance the failure does not persist for 
> subsequent connections.
>
> We'll be rolling this change out to the entire server pool and I'll give the 
> list another update with our findings.
>
> Thanks,
> Darryl
>
> On Fri, Jul 2, 2010 at 8:34 AM, Darryl Kuhn <darryl.k...@gmail.com> wrote:
>       Found the reset call - that was me being an idiot (I actually 
> introduced it when I added logging to debug this issue)... That's been 
> removed however there was no flush command. Somebody else
>       suggested it may have to do with the fact that we're running persistent 
> connections; and that if a failure occurred that failure would persist and 
> alter hashing rules for subsequent requests on
>       that connection. I do see a limited number of connection failures 
> (~5-15) throughout the day. I'm going to alter the config to make connections 
> non-persistent and see if it makes a difference
>       (however I'm doubtful this is the issue as we've run with memcache 
> server pools with a single instance - which would make it impossible to alter 
> the hashing distribution).
>
>       I'll report back what I find - thanks for your continued input!
>
>       -Darryl
>
>
> On Thu, Jul 1, 2010 at 12:28 PM, dormando <dorma...@rydia.net> wrote:
>       > Dormando... Thanks for the response. I've moved one of our servers to 
> use an upgraded version running 1.4.5. Couple of things:
>       >  *  I turned on logging last night
>       >  *  I'm only running -vv at the moment; -vvv generated way more 
> logging than we could handle. As it stands we've generated ~6GB of logs since 
> last night (using -vv). I'm looking at ways
>       of reducing log
>       >     volume by logging only specific data or perhaps standing up 10 or 
> 20 instances on one machine (using multiple ports) and turning on -vvv on 
> only one instance. Any suggestions there?
>
> Oh. I thought given your stats output that you had reproduced it on a
> server that was on a dev instance or local machine... but I guess that's
> related to below. Running logs on a production instance with a lot of
> traffic isn't that great of an idea, sorry about that :/
>
> > Looking at the logs two things jump out at me.
> >  *  While I had -vvv turned on I saw "stats reset" command being issued 
> > constantly (at least once a second). Nothing in the code that we have does 
> > this - do you know if the PHP client does
> this perhaps? Is
> >     this something you've seen in the past?
>
> No, you probably have some code that's doing something intensely wrong.
> Now we should probably add a counter for the number of times a "stats
> reset" has been called...
>
> >  *  Second with -vv on I get something like this:
> >      +  <71 get resourceCategoryPath21:984097:
> >         >71 sending key resourceCategoryPath21:984097:
> >         >71 END
> >         <71 set 
> > popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0__type
> >  0 86400 5
> >         >71 STORED
> >         <71 set 
> > popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0 1 
> > 86400 130230
> >         <59 get domain_host:www.bestbuyskins.com
> >         >59 sending key domain_host:www.bestbuyskins.com
> >         >59 END
> >  *  Two questions on the output - what's the "71" and "59"? Second - I 
> > would have thought I'd see an "END" after each "get" and "set" however you 
> > can see that's not the case.
> >
> > Last question... other than trolling through code is there a good place to 
> > go to understand how to parse out these log files (I'd prefer to self-help 
> > rather than bugging you)?
>
> Looks ike you figured that out. The numbers are the file descriptors
> (connections). END/STORED/etc are the responses.
>
> Honestly I'm going to take a wild guess that something on your end is
> constantly trying to reset the memcached instance.. it's probably doing a
> "flush_all" then a "stats reset" which would hide the flush counter. Do
> you see "flush_all" being called in the logs anywhere?
>
> Go find where you're calling stats reset and make it stop... that'll
> probably help bubble up what the real problem is.
>
>
>
>
>

Reply via email to