Re: Diagnosing Corruption
For what it is worth, I'm hesitant to upgrade memcached to the latest version as a step to try and solve this issue. It seems to me that since our installs have been running without issue for quite some time (close to a year), that there are other variables at play here. I just don't understand the variables. ;) Thanks, Mike On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote: Hi There, I'm trying to diagnose a new problem with Memcache that seems to be happening with greater frequency. The issue has to do with memcache get requests returning incorrect responses (data from from other keys returned). Restarting or flushing the servers seems to resolve the issue. Do any memcache veterans have any suggestions of how I might dig into this issue? Stats that I might want to trace, log files to look at, etc? Does maybe this symptom fit the description of any known issues? I'm keeping a casual eye on on curr_connections, listen_disabled_num, accepting_conns, bytes, and limit_maxbytes (all show nothing unusual). I've verified that all servers and clients are set up in a consistent fashion. I'm not sure where to go from here to better understand the problem. If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, connecting in with PHP Memcache 3.0.6 Tips? Mike -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Diagnosing Corruption
I just had another failure. After pulling down my apache web servers, and before restarting memcached I grabbed stats to see if they showed anything of interest: - All 3 servers were reporting for duty following a getServerStatus (PHP client call) - curr_connections were listed as 8 across all the instances (apache was down but cron jobs up, so that would have dropped things down considerably) - listen_disabled_num was listed as 0 across all the instances - accepting_conns was listed as 1 across all the instances - evictions listed as 0 - All items across all instances had an evicted and evicted_nonzero and evicted_time value of 0 - All slabs across all instances had a total_pages value of 1 - tailrepairs and outofmemory is listed with a value of 0 across all items in each instance - global hit rate is 0.9937 - get_hits is always* greater than cmd_set on a per slab basis. *One slab reported both values as equal As far as I can tell, memcache is reporting that the world is fine and dandy. Should I be enlarging scope of the search to look at OS related factors that could result in the client receiving bad data? None of the machines are dipping into swap. Thanks, Mike On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com wrote: For what it is worth, I'm hesitant to upgrade memcached to the latest version as a step to try and solve this issue. It seems to me that since our installs have been running without issue for quite some time (close to a year), that there are other variables at play here. I just don't understand the variables. ;) Thanks, Mike On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote: Hi There, I'm trying to diagnose a new problem with Memcache that seems to be happening with greater frequency. The issue has to do with memcache get requests returning incorrect responses (data from from other keys returned). Restarting or flushing the servers seems to resolve the issue. Do any memcache veterans have any suggestions of how I might dig into this issue? Stats that I might want to trace, log files to look at, etc? Does maybe this symptom fit the description of any known issues? I'm keeping a casual eye on on curr_connections, listen_disabled_num, accepting_conns, bytes, and limit_maxbytes (all show nothing unusual). I've verified that all servers and clients are set up in a consistent fashion. I'm not sure where to go from here to better understand the problem. If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, connecting in with PHP Memcache 3.0.6 Tips? Mike -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Diagnosing Corruption
Hi Mike, this sounds to me more like a client/coding error rather than memcached server. That's where I would focus first. Boris On Wed, Nov 19, 2014 at 11:41 AM, labnext...@gmail.com wrote: I just had another failure. After pulling down my apache web servers, and before restarting memcached I grabbed stats to see if they showed anything of interest: - All 3 servers were reporting for duty following a getServerStatus (PHP client call) - curr_connections were listed as 8 across all the instances (apache was down but cron jobs up, so that would have dropped things down considerably) - listen_disabled_num was listed as 0 across all the instances - accepting_conns was listed as 1 across all the instances - evictions listed as 0 - All items across all instances had an evicted and evicted_nonzero and evicted_time value of 0 - All slabs across all instances had a total_pages value of 1 - tailrepairs and outofmemory is listed with a value of 0 across all items in each instance - global hit rate is 0.9937 - get_hits is always* greater than cmd_set on a per slab basis. *One slab reported both values as equal As far as I can tell, memcache is reporting that the world is fine and dandy. Should I be enlarging scope of the search to look at OS related factors that could result in the client receiving bad data? None of the machines are dipping into swap. Thanks, Mike On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com wrote: For what it is worth, I'm hesitant to upgrade memcached to the latest version as a step to try and solve this issue. It seems to me that since our installs have been running without issue for quite some time (close to a year), that there are other variables at play here. I just don't understand the variables. ;) Thanks, Mike On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote: Hi There, I'm trying to diagnose a new problem with Memcache that seems to be happening with greater frequency. The issue has to do with memcache get requests returning incorrect responses (data from from other keys returned). Restarting or flushing the servers seems to resolve the issue. Do any memcache veterans have any suggestions of how I might dig into this issue? Stats that I might want to trace, log files to look at, etc? Does maybe this symptom fit the description of any known issues? I'm keeping a casual eye on on curr_connections, listen_disabled_num, accepting_conns, bytes, and limit_maxbytes (all show nothing unusual). I've verified that all servers and clients are set up in a consistent fashion. I'm not sure where to go from here to better understand the problem. If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, connecting in with PHP Memcache 3.0.6 Tips? Mike -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Diagnosing Corruption
Thanks Boris, I haven't really given that much thought. Out of curiosity, why do you think the issue might be on the client end? I ask, cause I really don't have a sense of what to look for on that end and wonder if you might have some suggestions. Best, Mike On Wednesday, November 19, 2014 12:46:16 PM UTC-4, Boris wrote: Hi Mike, this sounds to me more like a client/coding error rather than memcached server. That's where I would focus first. Boris On Wed, Nov 19, 2014 at 11:41 AM, labne...@gmail.com javascript: wrote: I just had another failure. After pulling down my apache web servers, and before restarting memcached I grabbed stats to see if they showed anything of interest: - All 3 servers were reporting for duty following a getServerStatus (PHP client call) - curr_connections were listed as 8 across all the instances (apache was down but cron jobs up, so that would have dropped things down considerably) - listen_disabled_num was listed as 0 across all the instances - accepting_conns was listed as 1 across all the instances - evictions listed as 0 - All items across all instances had an evicted and evicted_nonzero and evicted_time value of 0 - All slabs across all instances had a total_pages value of 1 - tailrepairs and outofmemory is listed with a value of 0 across all items in each instance - global hit rate is 0.9937 - get_hits is always* greater than cmd_set on a per slab basis. *One slab reported both values as equal As far as I can tell, memcache is reporting that the world is fine and dandy. Should I be enlarging scope of the search to look at OS related factors that could result in the client receiving bad data? None of the machines are dipping into swap. Thanks, Mike On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com wrote: For what it is worth, I'm hesitant to upgrade memcached to the latest version as a step to try and solve this issue. It seems to me that since our installs have been running without issue for quite some time (close to a year), that there are other variables at play here. I just don't understand the variables. ;) Thanks, Mike On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote: Hi There, I'm trying to diagnose a new problem with Memcache that seems to be happening with greater frequency. The issue has to do with memcache get requests returning incorrect responses (data from from other keys returned). Restarting or flushing the servers seems to resolve the issue. Do any memcache veterans have any suggestions of how I might dig into this issue? Stats that I might want to trace, log files to look at, etc? Does maybe this symptom fit the description of any known issues? I'm keeping a casual eye on on curr_connections, listen_disabled_num, accepting_conns, bytes, and limit_maxbytes (all show nothing unusual). I've verified that all servers and clients are set up in a consistent fashion. I'm not sure where to go from here to better understand the problem. If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, connecting in with PHP Memcache 3.0.6 Tips? Mike -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com javascript:. For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Diagnosing Corruption
You're probably getting spaces or newlines into your keys, which can cause the client protocol to desync with the server. Then you'll get all sorts of junk into random keys (or random responses from keys which're fine). Either filtering those or using the binary protocol should fix that for you. On Wed, 19 Nov 2014, labnext...@gmail.com wrote: Hi Boris, I think I may have mislead you. It is not one or two keys that get corrupted, it seems that most (if not all) keys fetched return incorrect data. For example during one of these failures (just this morning), a session key (prefixed with session_) returned an array related to a customer record (prefixed with lab_), a key related to a customer return a string related to a translation, and a key related to a translation returned All heck breaks loose (seemingly) across all keys. A flush brings things back into the fold. Make sense? Thanks, Mike On Wednesday, November 19, 2014 2:22:50 PM UTC-4, Boris wrote: I can think of many ways to screw up an application in a way that you describe. Simple programmer error can lead to this sort of behavior. I'd just log every time you do a set for that key with value type you are setting. On Wed, Nov 19, 2014 at 1:00 PM, labne...@gmail.com wrote: Thanks Boris, I haven't really given that much thought. Out of curiosity, why do you think the issue might be on the client end? I ask, cause I really don't have a sense of what to look for on that end and wonder if you might have some suggestions. Best, Mike On Wednesday, November 19, 2014 12:46:16 PM UTC-4, Boris wrote: Hi Mike, this sounds to me more like a client/coding error rather than memcached server. That's where I would focus first. Boris On Wed, Nov 19, 2014 at 11:41 AM, labne...@gmail.com wrote: I just had another failure. After pulling down my apache web servers, and before restarting memcached I grabbed stats to see if they showed anything of interest: - All 3 servers were reporting for duty following a getServerStatus (PHP client call) - curr_connections were listed as 8 across all the instances (apache was down but cron jobs up, so that would have dropped things down considerably) - listen_disabled_num was listed as 0 across all the instances - accepting_conns was listed as 1 across all the instances - evictions listed as 0 - All items across all instances had an evicted and evicted_nonzero and evicted_time value of 0 - All slabs across all instances had a total_pages value of 1 - tailrepairs and outofmemory is listed with a value of 0 across all items in each instance - global hit rate is 0.9937 - get_hits is always* greater than cmd_set on a per slab basis. *One slab reported both values as equal As far as I can tell, memcache is reporting that the world is fine and dandy. Should I be enlarging scope of the search to look at OS related factors that could result in the client receiving bad data? None of the machines are dipping into swap. Thanks, Mike On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com wrote: For what it is worth, I'm hesitant to upgrade memcached to the latest version as a step to try and solve this issue. It seems to me that since our installs have been running without issue for quite some time (close to a year), that there are other variables at play here. I just don't understand the variables. ;) Thanks, Mike On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote: Hi There, I'm trying to diagnose a new problem with Memcache that seems to be happening with greater frequency. The issue has to do with memcache get requests returning incorrect responses (data from from other keys returned). Restarting or flushing the servers seems to resolve the issue. Do any memcache veterans have any suggestions of how I might dig into this issue? Stats that I might want to trace, log files to look at, etc? Does maybe this symptom fit the description of any known issues? I'm keeping a casual eye on on curr_connections, listen_disabled_num, accepting_conns, bytes, and limit_maxbytes (all show nothing unusual). I've verified that all servers and clients are set up in a consistent fashion. I'm not sure where to go from here to better understand the problem. If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, connecting in with PHP Memcache 3.0.6 Tips? Mike -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this
RE: Diagnosing Corruption
Is the key got the wrong value always? or just sometime it is wrong? If it always got wrong value, the wrong value trend to be wrong when it is stored(or be overwrote); If it is just randomly got the wrong value, it should be corruption at get command. For the first case, you can use stats cachedump command to check if the key is correct. If the key is correct and you know how to use gdb, get a core file and check whether the value of the key is correct. check the refcount of the key, it should be 1 most of the time. If all of the wrong-key's refcount is greater than 1, that may be a leak of refcount, some bugs in old version could cause this issue and it is fixed at latest version. And I suggest you upgrade. For the second case, I think it should crash, not just got the wrong value of the most of the keys. And I suggest you upgrade memcached too if you just don't know how to use gdb. It is the first time I know about such kind of issue(I use 1.4.15 and 1.4.20 in a big cache cluster). Good luck. -Original Message- From: memcached@googlegroups.com [mailto:memcached@googlegroups.com] On Behalf Of dormando Sent: Thursday, November 20, 2014 9:09 AM To: memcached@googlegroups.com Subject: Re: Diagnosing Corruption You're probably getting spaces or newlines into your keys, which can cause the client protocol to desync with the server. Then you'll get all sorts of junk into random keys (or random responses from keys which're fine). Either filtering those or using the binary protocol should fix that for you. On Wed, 19 Nov 2014, labnext...@gmail.com wrote: Hi Boris, I think I may have mislead you. It is not one or two keys that get corrupted, it seems that most (if not all) keys fetched return incorrect data. For example during one of these failures (just this morning), a session key (prefixed with session_) returned an array related to a customer record (prefixed with lab_), a key related to a customer return a string related to a translation, and a key related to a translation returned All heck breaks loose (seemingly) across all keys. A flush brings things back into the fold. Make sense? Thanks, Mike On Wednesday, November 19, 2014 2:22:50 PM UTC-4, Boris wrote: I can think of many ways to screw up an application in a way that you describe. Simple programmer error can lead to this sort of behavior. I'd just log every time you do a set for that key with value type you are setting. On Wed, Nov 19, 2014 at 1:00 PM, labne...@gmail.com wrote: Thanks Boris, I haven't really given that much thought. Out of curiosity, why do you think the issue might be on the client end? I ask, cause I really don't have a sense of what to look for on that end and wonder if you might have some suggestions. Best, Mike On Wednesday, November 19, 2014 12:46:16 PM UTC-4, Boris wrote: Hi Mike, this sounds to me more like a client/coding error rather than memcached server. That's where I would focus first. Boris On Wed, Nov 19, 2014 at 11:41 AM, labne...@gmail.com wrote: I just had another failure. After pulling down my apache web servers, and before restarting memcached I grabbed stats to see if they showed anything of interest: - All 3 servers were reporting for duty following a getServerStatus (PHP client call) - curr_connections were listed as 8 across all the instances (apache was down but cron jobs up, so that would have dropped things down considerably) - listen_disabled_num was listed as 0 across all the instances - accepting_conns was listed as 1 across all the instances - evictions listed as 0 - All items across all instances had an evicted and evicted_nonzero and evicted_time value of 0 - All slabs across all instances had a total_pages value of 1 - tailrepairs and outofmemory is listed with a value of 0 across all items in each instance - global hit rate is 0.9937 - get_hits is always* greater than cmd_set on a per slab basis. *One slab reported both values as equal As far as I can tell, memcache is reporting that the world is fine and dandy. Should I be enlarging scope of the search to look at OS related factors that could result in the client receiving bad data? None of the machines are dipping into swap. Thanks, Mike On Wednesday, November 19, 2014 9:35:19 AM UTC-4, labne...@gmail.com wrote: For what it is worth, I'm hesitant to upgrade memcached to the latest version as a step to try and solve this issue. It seems to me that since our installs have been running without issue for quite some time (close to a year), that there are other variables at play here. I just don't understand the variables. ;) Thanks, Mike On Tuesday, November 18, 2014 2:00:46 PM UTC-4, labne...@gmail.com wrote: Hi There, I'm trying to diagnose a new problem with Memcache that seems to be happening with
Issue 384 in memcached: memcached refuse to recv data if the client send too much data without recving
Status: New Owner: Labels: Type-Defect Priority-Medium New issue 384 by kelvin0...@gmail.com: memcached refuse to recv data if the client send too much data without recving https://code.google.com/p/memcached/issues/detail?id=384 (I’m not sure whether it is a bug or a feature.) ### What steps will reproduce the problem? 1. start a memcached server on port 11211 2. download the snippet: https://gist.github.com/mckelvin/6aaf1d14e7866719a9bc and make sure python(2) is available 3. python memcached_reproduce.py ### What is the expected output? What do you see instead? The reproduce code is expected to exit 0. If the bug occurs, the python process should be idle and never times out. ### What version of the product are you using? On what operating system? CANNOT be reproduced on version 1.4.5(Gentoo), 1.4.13(Ubuntu), 1.4.15(OS X), CAN be reproduced on version 1.4.17(Gentoo), 1.4.20(OS X). Not all of the versions between V1.4.5 and V1.4.20 have been tested yet but I guess it is introduced in V1.4.16 or V1.4.17 (if it’s a bug). ### Please provide any additional information below. The issue is in storage commands. The doc says: The client sends a command line, and then a data block; after that the client expects one line of response, which will indicate success or failure. What if I send N(N 1) storage commands at once, and then expects N lines of response? The behaviour is not mentioned in the doc, and I’m not sure if it is acceptable. If not so, you may close the issue directly, otherwise this should be a bug. code to reproduce: https://gist.github.com/mckelvin/6aaf1d14e7866719a9bc The issue CANNOT be reproduced on version 1.4.5(Gentoo), 1.4.13(Ubuntu), 1.4.15(OS X), and CAN be reproduced on version 1.4.17(Gentoo), 1.4.20(OS X). Not all of the versions between V1.4.5 and V1.4.20 have been tested yet but I guess it is introduced in V1.4.16 or V1.4.17 (if it’s a bug). I know anyway the client should be blamed for sending so much data but refuse to receive any thing, but the server doesn't keep this behaviour between these versions, that sounds buggy. -- You received this message because this project is configured to send all issue notifications to this address. You may adjust your notification preferences at: https://code.google.com/hosting/settings -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.