Re: Disappearing Keys

2010-07-07 Thread Adam Lee
+1

On Tue, Jul 6, 2010 at 2:47 PM, dormando dorma...@rydia.net wrote:

 Or you could disable the failover feature...

 On Tue, 6 Jul 2010, Darryl Kuhn wrote:

  FYI - we made the change on one server and it does appear to have
 resolved premature key expiration.
 
  Effectively what appears to have been happening was that every so often a
 client was unable to connect to one or more of the memcached servers. When
 this happened it changed the key distribution. Because
  the connection was persistent it meant that subsequent requests would use
 the same connection handle with the reduced server pool. Turning off
 persistent connections ensures that a if we are unable to
  connect to a server in one instance the failure does not persist for
 subsequent connections.
 
  We'll be rolling this change out to the entire server pool and I'll give
 the list another update with our findings.
 
  Thanks,
  Darryl
 
  On Fri, Jul 2, 2010 at 8:34 AM, Darryl Kuhn darryl.k...@gmail.com
 wrote:
Found the reset call - that was me being an idiot (I actually
 introduced it when I added logging to debug this issue)... That's been
 removed however there was no flush command. Somebody else
suggested it may have to do with the fact that we're running
 persistent connections; and that if a failure occurred that failure would
 persist and alter hashing rules for subsequent requests on
that connection. I do see a limited number of connection failures
 (~5-15) throughout the day. I'm going to alter the config to make
 connections non-persistent and see if it makes a difference
(however I'm doubtful this is the issue as we've run with memcache
 server pools with a single instance - which would make it impossible to
 alter the hashing distribution).
 
I'll report back what I find - thanks for your continued input!
 
-Darryl
 
 
  On Thu, Jul 1, 2010 at 12:28 PM, dormando dorma...@rydia.net wrote:
 Dormando... Thanks for the response. I've moved one of our
 servers to use an upgraded version running 1.4.5. Couple of things:
  *  I turned on logging last night
  *  I'm only running -vv at the moment; -vvv generated way more
 logging than we could handle. As it stands we've generated ~6GB of logs
 since last night (using -vv). I'm looking at ways
of reducing log
 volume by logging only specific data or perhaps standing up
 10 or 20 instances on one machine (using multiple ports) and turning on -vvv
 on only one instance. Any suggestions there?
 
  Oh. I thought given your stats output that you had reproduced it on a
  server that was on a dev instance or local machine... but I guess that's
  related to below. Running logs on a production instance with a lot of
  traffic isn't that great of an idea, sorry about that :/
 
   Looking at the logs two things jump out at me.
*  While I had -vvv turned on I saw stats reset command being issued
 constantly (at least once a second). Nothing in the code that we have does
 this - do you know if the PHP client does
  this perhaps? Is
   this something you've seen in the past?
 
  No, you probably have some code that's doing something intensely wrong.
  Now we should probably add a counter for the number of times a stats
  reset has been called...
 
*  Second with -vv on I get something like this:
+  71 get resourceCategoryPath21:984097:
   71 sending key resourceCategoryPath21:984097:
   71 END
   71 set 
   popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0__type
 0 86400 5
   71 STORED
   71 set 
   popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0
 1 86400 130230
   59 get domain_host:www.bestbuyskins.com
   59 sending key domain_host:www.bestbuyskins.com
   59 END
*  Two questions on the output - what's the 71 and 59? Second - I
 would have thought I'd see an END after each get and set however you
 can see that's not the case.
  
   Last question... other than trolling through code is there a good place
 to go to understand how to parse out these log files (I'd prefer to
 self-help rather than bugging you)?
 
  Looks ike you figured that out. The numbers are the file descriptors
  (connections). END/STORED/etc are the responses.
 
  Honestly I'm going to take a wild guess that something on your end is
  constantly trying to reset the memcached instance.. it's probably doing a
  flush_all then a stats reset which would hide the flush counter. Do
  you see flush_all being called in the logs anywhere?
 
  Go find where you're calling stats reset and make it stop... that'll
  probably help bubble up what the real problem is.
 
 
 
 
 




-- 
awl


Re: Disappearing Keys

2010-07-06 Thread Darryl Kuhn
FYI - we made the change on one server and it does appear to have resolved
premature key expiration.

Effectively what appears to have been happening was that every so often a
client was unable to connect to one or more of the memcached servers. When
this happened it changed the key distribution. Because the connection was
persistent it meant that subsequent requests would use the same connection
handle with the reduced server pool. Turning off persistent connections
ensures that a if we are unable to connect to a server in one instance the
failure does not persist for subsequent connections.

We'll be rolling this change out to the entire server pool and I'll give the
list another update with our findings.

Thanks,
Darryl

On Fri, Jul 2, 2010 at 8:34 AM, Darryl Kuhn darryl.k...@gmail.com wrote:

 Found the reset call - that was me being an idiot (I actually introduced it
 when I added logging to debug this issue)... That's been removed however
 there was no flush command. Somebody else suggested it may have to do with
 the fact that we're running persistent connections; and that if a failure
 occurred that failure would persist and alter hashing rules for subsequent
 requests on that connection. I do see a limited number of connection
 failures (~5-15) throughout the day. I'm going to alter the config to make
 connections non-persistent and see if it makes a difference (however I'm
 doubtful this is the issue as we've run with memcache server pools with a
 single instance - which would make it impossible to alter the hashing
 distribution).

 I'll report back what I find - thanks for your continued input!

 -Darryl


 On Thu, Jul 1, 2010 at 12:28 PM, dormando dorma...@rydia.net wrote:

  Dormando... Thanks for the response. I've moved one of our servers to
 use an upgraded version running 1.4.5. Couple of things:
   *  I turned on logging last night
   *  I'm only running -vv at the moment; -vvv generated way more logging
 than we could handle. As it stands we've generated ~6GB of logs since last
 night (using -vv). I'm looking at ways of reducing log
  volume by logging only specific data or perhaps standing up 10 or 20
 instances on one machine (using multiple ports) and turning on -vvv on only
 one instance. Any suggestions there?

 Oh. I thought given your stats output that you had reproduced it on a
 server that was on a dev instance or local machine... but I guess that's
 related to below. Running logs on a production instance with a lot of
 traffic isn't that great of an idea, sorry about that :/

  Looking at the logs two things jump out at me.
   *  While I had -vvv turned on I saw stats reset command being issued
 constantly (at least once a second). Nothing in the code that we have does
 this - do you know if the PHP client does this perhaps? Is
  this something you've seen in the past?

 No, you probably have some code that's doing something intensely wrong.
 Now we should probably add a counter for the number of times a stats
 reset has been called...

   *  Second with -vv on I get something like this:
   +  71 get resourceCategoryPath21:984097:
  71 sending key resourceCategoryPath21:984097:
  71 END
  71 set 
  popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0__type
 0 86400 5
  71 STORED
  71 set 
  popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0
 1 86400 130230
  59 get domain_host:www.bestbuyskins.com
  59 sending key domain_host:www.bestbuyskins.com
  59 END
   *  Two questions on the output - what's the 71 and 59? Second - I
 would have thought I'd see an END after each get and set however you
 can see that's not the case.
 
  Last question... other than trolling through code is there a good place
 to go to understand how to parse out these log files (I'd prefer to
 self-help rather than bugging you)?

 Looks ike you figured that out. The numbers are the file descriptors
 (connections). END/STORED/etc are the responses.

 Honestly I'm going to take a wild guess that something on your end is
 constantly trying to reset the memcached instance.. it's probably doing a
 flush_all then a stats reset which would hide the flush counter. Do
 you see flush_all being called in the logs anywhere?

 Go find where you're calling stats reset and make it stop... that'll
 probably help bubble up what the real problem is.





Re: Disappearing Keys

2010-07-06 Thread dormando
Or you could disable the failover feature...

On Tue, 6 Jul 2010, Darryl Kuhn wrote:

 FYI - we made the change on one server and it does appear to have resolved 
 premature key expiration.

 Effectively what appears to have been happening was that every so often a 
 client was unable to connect to one or more of the memcached servers. When 
 this happened it changed the key distribution. Because
 the connection was persistent it meant that subsequent requests would use the 
 same connection handle with the reduced server pool. Turning off persistent 
 connections ensures that a if we are unable to
 connect to a server in one instance the failure does not persist for 
 subsequent connections.

 We'll be rolling this change out to the entire server pool and I'll give the 
 list another update with our findings.

 Thanks,
 Darryl

 On Fri, Jul 2, 2010 at 8:34 AM, Darryl Kuhn darryl.k...@gmail.com wrote:
   Found the reset call - that was me being an idiot (I actually 
 introduced it when I added logging to debug this issue)... That's been 
 removed however there was no flush command. Somebody else
   suggested it may have to do with the fact that we're running persistent 
 connections; and that if a failure occurred that failure would persist and 
 alter hashing rules for subsequent requests on
   that connection. I do see a limited number of connection failures 
 (~5-15) throughout the day. I'm going to alter the config to make connections 
 non-persistent and see if it makes a difference
   (however I'm doubtful this is the issue as we've run with memcache 
 server pools with a single instance - which would make it impossible to alter 
 the hashing distribution).

   I'll report back what I find - thanks for your continued input!

   -Darryl


 On Thu, Jul 1, 2010 at 12:28 PM, dormando dorma...@rydia.net wrote:
Dormando... Thanks for the response. I've moved one of our servers to 
 use an upgraded version running 1.4.5. Couple of things:
 *  I turned on logging last night
 *  I'm only running -vv at the moment; -vvv generated way more 
 logging than we could handle. As it stands we've generated ~6GB of logs since 
 last night (using -vv). I'm looking at ways
   of reducing log
    volume by logging only specific data or perhaps standing up 10 or 
 20 instances on one machine (using multiple ports) and turning on -vvv on 
 only one instance. Any suggestions there?

 Oh. I thought given your stats output that you had reproduced it on a
 server that was on a dev instance or local machine... but I guess that's
 related to below. Running logs on a production instance with a lot of
 traffic isn't that great of an idea, sorry about that :/

  Looking at the logs two things jump out at me.
   *  While I had -vvv turned on I saw stats reset command being issued 
  constantly (at least once a second). Nothing in the code that we have does 
  this - do you know if the PHP client does
 this perhaps? Is
      this something you've seen in the past?

 No, you probably have some code that's doing something intensely wrong.
 Now we should probably add a counter for the number of times a stats
 reset has been called...

   *  Second with -vv on I get something like this:
       +  71 get resourceCategoryPath21:984097:
          71 sending key resourceCategoryPath21:984097:
          71 END
          71 set 
  popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0__type
   0 86400 5
          71 STORED
          71 set 
  popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0 1 
  86400 130230
          59 get domain_host:www.bestbuyskins.com
          59 sending key domain_host:www.bestbuyskins.com
          59 END
   *  Two questions on the output - what's the 71 and 59? Second - I 
  would have thought I'd see an END after each get and set however you 
  can see that's not the case.
 
  Last question... other than trolling through code is there a good place to 
  go to understand how to parse out these log files (I'd prefer to self-help 
  rather than bugging you)?

 Looks ike you figured that out. The numbers are the file descriptors
 (connections). END/STORED/etc are the responses.

 Honestly I'm going to take a wild guess that something on your end is
 constantly trying to reset the memcached instance.. it's probably doing a
 flush_all then a stats reset which would hide the flush counter. Do
 you see flush_all being called in the logs anywhere?

 Go find where you're calling stats reset and make it stop... that'll
 probably help bubble up what the real problem is.







Re: Disappearing Keys

2010-07-02 Thread Darryl Kuhn
Found the reset call - that was me being an idiot (I actually introduced it
when I added logging to debug this issue)... That's been removed however
there was no flush command. Somebody else suggested it may have to do with
the fact that we're running persistent connections; and that if a failure
occurred that failure would persist and alter hashing rules for subsequent
requests on that connection. I do see a limited number of connection
failures (~5-15) throughout the day. I'm going to alter the config to make
connections non-persistent and see if it makes a difference (however I'm
doubtful this is the issue as we've run with memcache server pools with a
single instance - which would make it impossible to alter the hashing
distribution).

I'll report back what I find - thanks for your continued input!

-Darryl

On Thu, Jul 1, 2010 at 12:28 PM, dormando dorma...@rydia.net wrote:

  Dormando... Thanks for the response. I've moved one of our servers to use
 an upgraded version running 1.4.5. Couple of things:
   *  I turned on logging last night
   *  I'm only running -vv at the moment; -vvv generated way more logging
 than we could handle. As it stands we've generated ~6GB of logs since last
 night (using -vv). I'm looking at ways of reducing log
  volume by logging only specific data or perhaps standing up 10 or 20
 instances on one machine (using multiple ports) and turning on -vvv on only
 one instance. Any suggestions there?

 Oh. I thought given your stats output that you had reproduced it on a
 server that was on a dev instance or local machine... but I guess that's
 related to below. Running logs on a production instance with a lot of
 traffic isn't that great of an idea, sorry about that :/

  Looking at the logs two things jump out at me.
   *  While I had -vvv turned on I saw stats reset command being issued
 constantly (at least once a second). Nothing in the code that we have does
 this - do you know if the PHP client does this perhaps? Is
  this something you've seen in the past?

 No, you probably have some code that's doing something intensely wrong.
 Now we should probably add a counter for the number of times a stats
 reset has been called...

   *  Second with -vv on I get something like this:
   +  71 get resourceCategoryPath21:984097:
  71 sending key resourceCategoryPath21:984097:
  71 END
  71 set 
  popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0__type
 0 86400 5
  71 STORED
  71 set 
  popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0
 1 86400 130230
  59 get domain_host:www.bestbuyskins.com
  59 sending key domain_host:www.bestbuyskins.com
  59 END
   *  Two questions on the output - what's the 71 and 59? Second - I
 would have thought I'd see an END after each get and set however you
 can see that's not the case.
 
  Last question... other than trolling through code is there a good place
 to go to understand how to parse out these log files (I'd prefer to
 self-help rather than bugging you)?

 Looks ike you figured that out. The numbers are the file descriptors
 (connections). END/STORED/etc are the responses.

 Honestly I'm going to take a wild guess that something on your end is
 constantly trying to reset the memcached instance.. it's probably doing a
 flush_all then a stats reset which would hide the flush counter. Do
 you see flush_all being called in the logs anywhere?

 Go find where you're calling stats reset and make it stop... that'll
 probably help bubble up what the real problem is.



Re: Disappearing Keys

2010-07-01 Thread dormando
 Dormando... Thanks for the response. I've moved one of our servers to use an 
 upgraded version running 1.4.5. Couple of things:
  *  I turned on logging last night
  *  I'm only running -vv at the moment; -vvv generated way more logging than 
 we could handle. As it stands we've generated ~6GB of logs since last night 
 (using -vv). I'm looking at ways of reducing log
 volume by logging only specific data or perhaps standing up 10 or 20 
 instances on one machine (using multiple ports) and turning on -vvv on only 
 one instance. Any suggestions there?

Oh. I thought given your stats output that you had reproduced it on a
server that was on a dev instance or local machine... but I guess that's
related to below. Running logs on a production instance with a lot of
traffic isn't that great of an idea, sorry about that :/

 Looking at the logs two things jump out at me.
  *  While I had -vvv turned on I saw stats reset command being issued 
 constantly (at least once a second). Nothing in the code that we have does 
 this - do you know if the PHP client does this perhaps? Is
 this something you've seen in the past?

No, you probably have some code that's doing something intensely wrong.
Now we should probably add a counter for the number of times a stats
reset has been called...

  *  Second with -vv on I get something like this:
  +  71 get resourceCategoryPath21:984097:
 71 sending key resourceCategoryPath21:984097:
 71 END
 71 set 
 popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0__type 
 0 86400 5
 71 STORED
 71 set 
 popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0 1 
 86400 130230
 59 get domain_host:www.bestbuyskins.com
 59 sending key domain_host:www.bestbuyskins.com
 59 END
  *  Two questions on the output - what's the 71 and 59? Second - I would 
 have thought I'd see an END after each get and set however you can see 
 that's not the case.

 Last question... other than trolling through code is there a good place to go 
 to understand how to parse out these log files (I'd prefer to self-help 
 rather than bugging you)?

Looks ike you figured that out. The numbers are the file descriptors
(connections). END/STORED/etc are the responses.

Honestly I'm going to take a wild guess that something on your end is
constantly trying to reset the memcached instance.. it's probably doing a
flush_all then a stats reset which would hide the flush counter. Do
you see flush_all being called in the logs anywhere?

Go find where you're calling stats reset and make it stop... that'll
probably help bubble up what the real problem is.


Re: Disappearing Keys

2010-06-27 Thread dormando

 Any thoughts on what might be going on here?

 As for vitals/system config:

 Here's a recent stats dump:
 STAT pid 19986
 STAT uptime 8526687
 STAT time 1277416425
 STAT version 1.2.8

You should probably upgrade, but I think this version has what you may
need...

If you can reproduce this against an idle memcached, start up an instance
with -vvv (three v's), which will tell you why a GET failed, as well as
all of the parameters of any SET's/GET's you run.

So you can see if the GET failed because of it expired, was evicted,
expired due to a flush command (which doesn't seem to be your case), etc.
You can also see if your PHP client is actually setting the timeout
correctly or not.


Disappearing Keys

2010-06-24 Thread Darryl Kuhn
We've been battling an issue for a while now (in fact not sure quite how
long it's been going on). It appears that keys are being evicted before
their expiration time, however when we check stats we see evicted: 0 on
all slabs. To track this down we've added logging (see below) for a
particular key to track activity.

Server Date Time Request Key Message  10.16.18.35 2010-Jun-23
1:00:46 15463

popularProducts:2010-06-21:collegiate:en:7:image_wall:0 Get failed
10.16.18.35 2010-Jun-23 1:00:47 15463
popularProducts:2010-06-21:collegiate:en:7:image_wall:0 Set to expire in
86400 seconds  10.16.18.24 2010-Jun-23 9:58:29 55378
popularProducts:2010-06-21:collegiate:en:7:image_wall:0 Get failed
10.16.18.24 2010-Jun-23 9:58:43 55378
popularProducts:2010-06-21:collegiate:en:7:image_wall:0 Set to expire in
86400 seconds  10.16.18.19 2010-Jun-23 10:10:54 87555
popularProducts:2010-06-21:collegiate:en:7:image_wall:0 Get failed
10.16.18.19 2010-Jun-23 10:11:08 87555
popularProducts:2010-06-21:collegiate:en:7:image_wall:0 Set to expire in
86400 seconds  10.16.18.14 2010-Jun-23 10:40:57 55731
popularProducts:2010-06-21:collegiate:en:7:image_wall:0 Get failed
10.16.18.14 2010-Jun-23 10:41:15 55731
popularProducts:2010-06-21:collegiate:en:7:image_wall:0 Set to expire in
86400 seconds
This is a small snippet on just one key, but it is representative of what
we're seeing. As you can see we're setting the key to expire in one day
however subsequent GETs are failing well before the 1 day mark... We thought
that perhaps we were reaching memory limits and hitting LRU, however that
doesn't appear to be the case either; we're using ~18% of memory allocated
and a stats items call shows evicted and evicted_time are 0 for all slabs.


We've tried switching pools around to eliminate bad ram as a cause, we've
even run a pool on a local machine to eliminate network related issues and
we see the same symptoms.

Any thoughts on what might be going on here?

As for vitals/system config:

Here's a recent stats dump:
STAT pid 19986
STAT uptime 8526687
STAT time 1277416425
STAT version 1.2.8
STAT pointer_size 64
STAT rusage_user 1510.430396
STAT rusage_system 3328.800037
STAT curr_items 313957
STAT total_items 0
STAT bytes 449831564
STAT curr_connections 1025
STAT total_connections 1
STAT connection_structures 1276
STAT cmd_flush 0
STAT cmd_get 14
STAT cmd_set 0
STAT get_hits 14
STAT get_misses 0
STAT evictions 0
STAT bytes_read 491
STAT bytes_written 18449
STAT limit_maxbytes 2147483648
STAT threads 17
STAT accepting_conns 1
STAT listen_disabled_num 0
STAT delete 16755
STAT replace 6515615

We're connecting to memcached through PHP 4.4.4 using the PECL memcache
extension configured thusly:
memcache support = enabled
memcache.allow_failover = 1 = 1
memcache.chunk_size = 8192 = 8192
memcache.default_port = 11211 = 11211
memcache.hash_function = crc32 = crc32
memcache.hash_strategy = consistent = consistent
memcache.max_failover_attempts = 20 = 20