Re: Diagnosing Corruption

2014-11-19 Thread labnextddx
For what it is worth, I'm hesitant to upgrade memcached to the latest 
version as a step to try and solve this issue.  It seems to me that since 
our installs have been running without issue for quite some time (close to 
a year), that there are other variables at play here.  I just don't 
understand the variables.  ;)

Thanks,

Mike


On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote:

 Hi There,

 I'm trying to diagnose a new problem with Memcache that seems to be 
 happening with greater frequency.  The issue has to do with memcache get 
 requests returning incorrect responses (data from from other keys 
 returned).  Restarting or flushing the servers seems to resolve the issue. 

 Do any memcache veterans have any suggestions of how I might dig into this 
 issue?  Stats that I might want to trace, log files to look at, etc?  Does 
 maybe this symptom fit the description of any known issues?

 I'm keeping a casual eye on 
 on curr_connections, listen_disabled_num, accepting_conns, bytes, and 
 limit_maxbytes (all show nothing unusual).  I've verified that all servers 
 and clients are set up in a consistent fashion.  I'm not sure where to go 
 from here to better understand the problem.


 If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, 
 connecting in with PHP Memcache 3.0.6


 Tips?

 Mike



-- 

--- 
You received this message because you are subscribed to the Google Groups 
memcached group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Diagnosing Corruption

2014-11-19 Thread labnextddx
I just had another failure.  After pulling down my apache web servers, and 
before restarting memcached I grabbed stats to see if they showed anything 
of interest:

 - All 3 servers were reporting for duty following a getServerStatus (PHP 
client call)
 - curr_connections were listed as 8 across all the instances (apache was 
down but cron jobs up, so that would have dropped things down considerably)
 - listen_disabled_num was listed as 0 across all the instances
 - accepting_conns was listed as 1 across all the instances
 - evictions listed as 0
 - All items across all instances had an evicted and evicted_nonzero and 
evicted_time value of 0
 - All slabs across all instances had a total_pages value of 1
 - tailrepairs and outofmemory is listed with a value of 0 across all items 
in each instance
 - global hit rate is 0.9937
 - get_hits is always* greater than cmd_set on a per slab basis.  *One slab 
reported both values as equal


As far as I can tell, memcache is reporting that the world is fine and 
dandy.  Should I be enlarging scope of the search to look at OS related 
factors that could result in the client receiving bad data?  None of the 
machines are dipping into swap.

Thanks,

Mike



On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com wrote:

 For what it is worth, I'm hesitant to upgrade memcached to the latest 
 version as a step to try and solve this issue.  It seems to me that since 
 our installs have been running without issue for quite some time (close to 
 a year), that there are other variables at play here.  I just don't 
 understand the variables.  ;)

 Thanks,

 Mike


 On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote:

 Hi There,

 I'm trying to diagnose a new problem with Memcache that seems to be 
 happening with greater frequency.  The issue has to do with memcache get 
 requests returning incorrect responses (data from from other keys 
 returned).  Restarting or flushing the servers seems to resolve the issue. 

 Do any memcache veterans have any suggestions of how I might dig into 
 this issue?  Stats that I might want to trace, log files to look at, etc? 
  Does maybe this symptom fit the description of any known issues?

 I'm keeping a casual eye on 
 on curr_connections, listen_disabled_num, accepting_conns, bytes, and 
 limit_maxbytes (all show nothing unusual).  I've verified that all servers 
 and clients are set up in a consistent fashion.  I'm not sure where to go 
 from here to better understand the problem.


 If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, 
 connecting in with PHP Memcache 3.0.6


 Tips?

 Mike



 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
memcached group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Diagnosing Corruption

2014-11-19 Thread Boris Partensky
Hi Mike, this sounds to me more like a client/coding error rather than
memcached server. That's where I would focus first.

Boris

On Wed, Nov 19, 2014 at 11:41 AM, labnext...@gmail.com wrote:

 I just had another failure.  After pulling down my apache web servers, and
 before restarting memcached I grabbed stats to see if they showed anything
 of interest:

  - All 3 servers were reporting for duty following a getServerStatus (PHP
 client call)
  - curr_connections were listed as 8 across all the instances (apache was
 down but cron jobs up, so that would have dropped things down considerably)
  - listen_disabled_num was listed as 0 across all the instances
  - accepting_conns was listed as 1 across all the instances
  - evictions listed as 0
  - All items across all instances had an evicted and evicted_nonzero and
 evicted_time value of 0
  - All slabs across all instances had a total_pages value of 1
  - tailrepairs and outofmemory is listed with a value of 0 across all
 items in each instance
  - global hit rate is 0.9937
  - get_hits is always* greater than cmd_set on a per slab basis.  *One
 slab reported both values as equal


 As far as I can tell, memcache is reporting that the world is fine and
 dandy.  Should I be enlarging scope of the search to look at OS related
 factors that could result in the client receiving bad data?  None of the
 machines are dipping into swap.

 Thanks,

 Mike



 On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com
 wrote:

 For what it is worth, I'm hesitant to upgrade memcached to the latest
 version as a step to try and solve this issue.  It seems to me that since
 our installs have been running without issue for quite some time (close to
 a year), that there are other variables at play here.  I just don't
 understand the variables.  ;)

 Thanks,

 Mike


 On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote:

 Hi There,

 I'm trying to diagnose a new problem with Memcache that seems to be
 happening with greater frequency.  The issue has to do with memcache get
 requests returning incorrect responses (data from from other keys
 returned).  Restarting or flushing the servers seems to resolve the issue.

 Do any memcache veterans have any suggestions of how I might dig into
 this issue?  Stats that I might want to trace, log files to look at, etc?
 Does maybe this symptom fit the description of any known issues?

 I'm keeping a casual eye on on curr_connections, listen_disabled_num, 
 accepting_conns,
 bytes, and limit_maxbytes (all show nothing unusual).  I've verified that
 all servers and clients are set up in a consistent fashion.  I'm not sure
 where to go from here to better understand the problem.


 If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers,
 connecting in with PHP Memcache 3.0.6


 Tips?

 Mike





 --

 ---
 You received this message because you are subscribed to the Google Groups
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to memcached+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


-- 

--- 
You received this message because you are subscribed to the Google Groups 
memcached group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Diagnosing Corruption

2014-11-19 Thread labnextddx
Thanks Boris,

I haven't really given that much thought.  Out of curiosity, why do you 
think the issue might be on the client end?  I ask, cause I really don't 
have a sense of what to look for on that end and wonder if you might have 
some suggestions.

Best,

Mike


On Wednesday, November 19, 2014 12:46:16 PM UTC-4, Boris wrote:

 Hi Mike, this sounds to me more like a client/coding error rather than 
 memcached server. That's where I would focus first.

 Boris

 On Wed, Nov 19, 2014 at 11:41 AM, labne...@gmail.com javascript: 
 wrote:

 I just had another failure.  After pulling down my apache web servers, 
 and before restarting memcached I grabbed stats to see if they showed 
 anything of interest:

  - All 3 servers were reporting for duty following a getServerStatus (PHP 
 client call)
  - curr_connections were listed as 8 across all the instances (apache was 
 down but cron jobs up, so that would have dropped things down considerably)
  - listen_disabled_num was listed as 0 across all the instances
  - accepting_conns was listed as 1 across all the instances
  - evictions listed as 0
  - All items across all instances had an evicted and evicted_nonzero and 
 evicted_time value of 0
  - All slabs across all instances had a total_pages value of 1
  - tailrepairs and outofmemory is listed with a value of 0 across all 
 items in each instance
  - global hit rate is 0.9937
  - get_hits is always* greater than cmd_set on a per slab basis.  *One 
 slab reported both values as equal


 As far as I can tell, memcache is reporting that the world is fine and 
 dandy.  Should I be enlarging scope of the search to look at OS related 
 factors that could result in the client receiving bad data?  None of the 
 machines are dipping into swap.

 Thanks,

 Mike



 On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com 
 wrote:

 For what it is worth, I'm hesitant to upgrade memcached to the latest 
 version as a step to try and solve this issue.  It seems to me that since 
 our installs have been running without issue for quite some time (close to 
 a year), that there are other variables at play here.  I just don't 
 understand the variables.  ;)

 Thanks,

 Mike


 On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com 
 wrote:

 Hi There,

 I'm trying to diagnose a new problem with Memcache that seems to be 
 happening with greater frequency.  The issue has to do with memcache get 
 requests returning incorrect responses (data from from other keys 
 returned).  Restarting or flushing the servers seems to resolve the issue. 

 Do any memcache veterans have any suggestions of how I might dig into 
 this issue?  Stats that I might want to trace, log files to look at, etc?  
 Does maybe this symptom fit the description of any known issues?

 I'm keeping a casual eye on on curr_connections, listen_disabled_num, 
 accepting_conns, 
 bytes, and limit_maxbytes (all show nothing unusual).  I've verified that 
 all servers and clients are set up in a consistent fashion.  I'm not sure 
 where to go from here to better understand the problem.


 If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, 
 connecting in with PHP Memcache 3.0.6


 Tips?

 Mike



  

 -- 

 --- 
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to memcached+...@googlegroups.com javascript:.
 For more options, visit https://groups.google.com/d/optout.




-- 

--- 
You received this message because you are subscribed to the Google Groups 
memcached group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Diagnosing Corruption

2014-11-19 Thread dormando
You're probably getting spaces or newlines into your keys, which can cause
the client protocol to desync with the server. Then you'll get all sorts
of junk into random keys (or random responses from keys which're fine).

Either filtering those or using the binary protocol should fix that for
you.

On Wed, 19 Nov 2014, labnext...@gmail.com wrote:

 Hi Boris,
 I think I may have mislead you.  It is not one or two keys that get 
 corrupted, it seems that most (if not all) keys fetched return incorrect data.
  For example during one of these failures (just this morning), a session key 
 (prefixed with session_) returned an array related to a customer
 record (prefixed with lab_), a key related to a customer return a string 
 related to a translation, and a key related to a translation returned

 All heck breaks loose (seemingly) across all keys.  A flush brings things 
 back into the fold.

 Make sense?

 Thanks,

 Mike


 On Wednesday, November 19, 2014 2:22:50 PM UTC-4, Boris wrote:
   I can think of many ways to screw up an application in a way that you 
 describe. Simple programmer error can lead to this sort of
   behavior. I'd just log every time you do a set for that key with value 
 type you are setting.

   On Wed, Nov 19, 2014 at 1:00 PM, labne...@gmail.com wrote:
 Thanks Boris,
 I haven't really given that much thought.  Out of curiosity, why do you think 
 the issue might be on the client end?  I ask, cause I
 really don't have a sense of what to look for on that end and wonder if you 
 might have some suggestions.

 Best,

 Mike


 On Wednesday, November 19, 2014 12:46:16 PM UTC-4, Boris wrote:
   Hi Mike, this sounds to me more like a client/coding error rather than 
 memcached server. That's where I would focus first.
 Boris

 On Wed, Nov 19, 2014 at 11:41 AM, labne...@gmail.com wrote:
   I just had another failure.  After pulling down my apache web servers, 
 and before restarting memcached I grabbed
   stats to see if they showed anything of interest:
  - All 3 servers were reporting for duty following a getServerStatus (PHP 
 client call)
  - curr_connections were listed as 8 across all the instances (apache was 
 down but cron jobs up, so that would have dropped
 things down considerably)
  - listen_disabled_num was listed as 0 across all the instances
  - accepting_conns was listed as 1 across all the instances
  - evictions listed as 0
  - All items across all instances had an evicted and evicted_nonzero and 
 evicted_time value of 0
  - All slabs across all instances had a total_pages value of 1
  - tailrepairs and outofmemory is listed with a value of 0 across all items 
 in each instance
  - global hit rate is 0.9937
  - get_hits is always* greater than cmd_set on a per slab basis.  *One slab 
 reported both values as equal


 As far as I can tell, memcache is reporting that the world is fine and dandy. 
  Should I be enlarging scope of the search to
 look at OS related factors that could result in the client receiving bad 
 data?  None of the machines are dipping into swap.

 Thanks,

 Mike



 On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com wrote:
   For what it is worth, I'm hesitant to upgrade memcached to the latest 
 version as a step to try and solve this
   issue.  It seems to me that since our installs have been running 
 without issue for quite some time (close to a
   year), that there are other variables at play here.  I just don't 
 understand the variables.  ;)
 Thanks,

 Mike


 On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote:
   Hi There,
 I'm trying to diagnose a new problem with Memcache that seems to be happening 
 with greater frequency.  The
 issue has to do with memcache get requests returning incorrect responses 
 (data from from other keys returned). 
 Restarting or flushing the servers seems to resolve the issue. 

 Do any memcache veterans have any suggestions of how I might dig into this 
 issue?  Stats that I might want to
 trace, log files to look at, etc?  Does maybe this symptom fit the 
 description of any known issues?

 I'm keeping a casual eye on on curr_connections, listen_disabled_num, 
 accepting_conns, bytes, and
 limit_maxbytes (all show nothing unusual).  I've verified that all servers 
 and clients are set up in a
 consistent fashion.  I'm not sure where to go from here to better understand 
 the problem.


 If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, 
 connecting in with PHP Memcache 3.0.6


 Tips?

 Mike



  

 --

 ---
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to memcached+...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


 --

 ---
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this 

RE: Diagnosing Corruption

2014-11-19 Thread Jason CHAN
Is the key got the wrong value always? or just sometime it is wrong?  If it 
always got wrong value, the wrong value trend to be wrong when it is stored(or 
be  overwrote); If it is just randomly  got the wrong value, it should be 
corruption at get command.
  For the first case, you can use  stats cachedump command to check if the 
key is correct. If the key is correct and you know how to use gdb, get a core 
file and check whether the value of the key is correct. check the refcount of 
the key, it should be 1 most of the time. If all of the wrong-key's refcount 
is greater than 1, that may be a leak of refcount, some bugs in old version 
could cause this issue and it is fixed at latest version. And I suggest you 
upgrade.
  For the second case, I think it should crash, not just got the wrong value of 
the most of the keys.
  And I suggest you upgrade memcached too if you just don't know how to use 
gdb. It is the first time I know about such kind of issue(I use 1.4.15 and 
1.4.20 in a big cache cluster).
  Good luck.

-Original Message-
From: memcached@googlegroups.com [mailto:memcached@googlegroups.com] On Behalf 
Of dormando
Sent: Thursday, November 20, 2014 9:09 AM
To: memcached@googlegroups.com
Subject: Re: Diagnosing Corruption

You're probably getting spaces or newlines into your keys, which can cause the 
client protocol to desync with the server. Then you'll get all sorts of junk 
into random keys (or random responses from keys which're fine).

Either filtering those or using the binary protocol should fix that for you.

On Wed, 19 Nov 2014, labnext...@gmail.com wrote:

 Hi Boris,
 I think I may have mislead you.  It is not one or two keys that get 
 corrupted, it seems that most (if not all) keys fetched return incorrect data.
  For example during one of these failures (just this morning), a 
 session key (prefixed with session_) returned an array related to a customer 
 record (prefixed with lab_), a key related to a customer return a string 
 related to a translation, and a key related to a translation returned

 All heck breaks loose (seemingly) across all keys.  A flush brings things 
 back into the fold.

 Make sense?

 Thanks,

 Mike


 On Wednesday, November 19, 2014 2:22:50 PM UTC-4, Boris wrote:
   I can think of many ways to screw up an application in a way that you 
 describe. Simple programmer error can lead to this sort of
   behavior. I'd just log every time you do a set for that key with value 
 type you are setting.

   On Wed, Nov 19, 2014 at 1:00 PM, labne...@gmail.com wrote:
 Thanks Boris,
 I haven't really given that much thought.  Out of curiosity, why do 
 you think the issue might be on the client end?  I ask, cause I really don't 
 have a sense of what to look for on that end and wonder if you might have 
 some suggestions.

 Best,

 Mike


 On Wednesday, November 19, 2014 12:46:16 PM UTC-4, Boris wrote:
   Hi Mike, this sounds to me more like a client/coding error rather than 
 memcached server. That's where I would focus first.
 Boris

 On Wed, Nov 19, 2014 at 11:41 AM, labne...@gmail.com wrote:
   I just had another failure.  After pulling down my apache web servers, 
 and before restarting memcached I grabbed
   stats to see if they showed anything of interest:
  - All 3 servers were reporting for duty following a getServerStatus 
 (PHP client call)
  - curr_connections were listed as 8 across all the instances (apache 
 was down but cron jobs up, so that would have dropped things down 
 considerably)
  - listen_disabled_num was listed as 0 across all the instances
  - accepting_conns was listed as 1 across all the instances
  - evictions listed as 0
  - All items across all instances had an evicted and evicted_nonzero 
 and evicted_time value of 0
  - All slabs across all instances had a total_pages value of 1
  - tailrepairs and outofmemory is listed with a value of 0 across all 
 items in each instance
  - global hit rate is 0.9937
  - get_hits is always* greater than cmd_set on a per slab basis.  *One 
 slab reported both values as equal


 As far as I can tell, memcache is reporting that the world is fine and 
 dandy.  Should I be enlarging scope of the search to look at OS related 
 factors that could result in the client receiving bad data?  None of the 
 machines are dipping into swap.

 Thanks,

 Mike



 On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com wrote:
   For what it is worth, I'm hesitant to upgrade memcached to the latest 
 version as a step to try and solve this
   issue.  It seems to me that since our installs have been running 
 without issue for quite some time (close to a
   year), that there are other variables at play here.  I just 
 don't understand the variables.  ;) Thanks,

 Mike


 On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote:
   Hi There,
 I'm trying to diagnose a new problem with Memcache that seems to be 
 happening with 

Issue 384 in memcached: memcached refuse to recv data if the client send too much data without recving

2014-11-19 Thread memcached

Status: New
Owner: 
Labels: Type-Defect Priority-Medium

New issue 384 by kelvin0...@gmail.com: memcached refuse to recv data if the  
client send too much data without recving

https://code.google.com/p/memcached/issues/detail?id=384

(I’m not sure whether it is a bug or a feature.)


### What steps will reproduce the problem?
1. start a memcached server on port 11211
2. download the snippet:  
https://gist.github.com/mckelvin/6aaf1d14e7866719a9bc and make sure  
python(2) is available

3. python memcached_reproduce.py

### What is the expected output? What do you see instead?
The reproduce code is expected to exit 0. If the bug occurs, the python  
process should be idle and never times out.


### What version of the product are you using? On what operating system?
CANNOT be reproduced on version 1.4.5(Gentoo), 1.4.13(Ubuntu), 1.4.15(OS X),
CAN be reproduced on version 1.4.17(Gentoo), 1.4.20(OS X).
Not all of the versions between V1.4.5 and V1.4.20 have been tested yet but  
I guess it is introduced in V1.4.16 or V1.4.17 (if it’s a bug).


### Please provide any additional information below.

The issue is in storage commands. The doc says:

The client sends a command line, and then a data block; after that the  
client expects one line of response, which will indicate success or  
failure.


What if I send N(N  1) storage commands at once, and then expects N lines  
of response? The behaviour is not mentioned in the doc, and I’m not sure if  
it is acceptable. If not so, you may close the issue directly, otherwise  
this should be a bug.


code to reproduce: https://gist.github.com/mckelvin/6aaf1d14e7866719a9bc


The issue
CANNOT be reproduced on version 1.4.5(Gentoo), 1.4.13(Ubuntu), 1.4.15(OS X),
 and CAN be reproduced on version 1.4.17(Gentoo), 1.4.20(OS X). Not all of  
the versions between V1.4.5 and V1.4.20 have been tested yet but I guess it  
is introduced in V1.4.16 or V1.4.17 (if it’s a bug).


I know anyway the client should be blamed for sending so much data but  
refuse to receive any thing, but the server doesn't keep this behaviour  
between these versions, that sounds buggy.


--
You received this message because this project is configured to send all  
issue notifications to this address.

You may adjust your notification preferences at:
https://code.google.com/hosting/settings

--

--- 
You received this message because you are subscribed to the Google Groups memcached group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.