Re: stats help
On Jul 24, 10:39 pm, Spike bchhedaf...@gmail.com wrote: hi, i am newbie to memcached. I need help in finding how to get throughput stat. I want to see how much throughput memcache is getting. stats command does not list any stat for through put (requests per sec). Any idea on how to go about getting that info? Does memcache keep track of this information? It's rare to keep derived stats like that in general. It's usually not interesting. Do you want average requests per second over the lifetime of the process? Over the last second?, 60 seconds? 300, 900, 3600, etc... Most of the time, this is easily observable from the outside. Collect counters -- wait a bit, collect them again, then do your own math. That'll give you exactly what you want.
Re: stats help
Hi, Take stats every minute, calculate the delta - and you have throughput per minute. On Sun, Jul 25, 2010 at 8:39 AM, Spike bchhedaf...@gmail.com wrote: hi, i am newbie to memcached. I need help in finding how to get throughput stat. I want to see how much throughput memcache is getting. stats command does not list any stat for through put (requests per sec). Any idea on how to go about getting that info? Does memcache keep track of this information? Thanks for help!
Re: Using PCIe SSDs instead of RAM
On Fri, Jul 23, 2010 at 8:47 AM, dormando dorma...@rydia.net wrote: I tried. Try the engine branch? I guess, I'll have to at some point. Just wanted to say, that LRU was designed as an algorithm for a uniform cost model, where all elements are almost equally important (have the same cost of miss) and the only thing that distinguishes them is the pattern of accesses. This is clearly not a good model for memcache, where: some elements are totally unimportant as they have already expired, some elements are larger than the others, some are always processed in batches (multigets), and so on. In my opinion GC moves the reality closer to the model, by removing unimportant elements, so if you want LRU to work correctly you should at least perform GC. You could also try to modify LRU to model that one large item actually occupies space that could be better utilies by several small elements (this is also a simple change). If you fill comfortable without GC, I am OK with that, just do not suggest, that GC is against LRU. Alright, I'm sorry. I've been unfair to you (and a few others recently). I've been unnecessarily grumpy. I tried to explain myself as fairly as possible, and Dustin added the words that I apparently forgot already, in that these things are better pressed through via SE's. I get annoyed by these threads because: - I really don't care for arguments on this level. When I said GC goes against the LRU I mean that the LRU we have doesn't require GC. The whole point of adding the LRU was so we could skip that part. I'm describing *intent*, I'm just too tired to keep arguing these things. - The thread hijacking is seriously annoying. If you want to ping us about an ignored patch, start a new thread or necro your own old thread. :( - Your original e-mail opened with We run this in single threaded mode and the performance is good enough for us so please merge it. I'm pretty dumbfounded that people can take a project which is supposed to be the performant underpinnings of the entire bloody internet and not do any sort of performance testing. I try to test things and I do have some hardware on hand but I'm still trying to find the motivation in myself to do a thorough performance run through of the engine branch. There's a lot of stuff going on in there. This is time consuming and often frustrating work. You did make a good attempt at building an efficient implementation, and it's a very clever way to go about the business, but best case: - You're adding logic to the most central global lock - You're adding 16 bytes per object - Plus some misc memory overhead (minor). If they're not causing the locks to be problems, the memory efficiency drop is an issue for many more people. If we make changes to the memory requirements of the default engine, I really only want to entertain ideas that make it *drop* requirements (we have some, need to start testing them as the engine stuff gets out there). The big picture is many users have small items, and if we push this change many people will suffer. Yes it's true that once those metrics expose an issue you technically already have an issue, but it's not an instant dropoff. Easily calculable with graphs and things like the evicted_time stats. Items dropping off the end that haven't been touched in 365,000+ seconds aren't likely to cause you a problem tomorrow or even next week, but watch for that number to fall. This is also why the evicted and evicted_nonzero stats were split. Eviction of an item with a 0 expiration is nearly meaningless. However, I can't seem to get this through without being rude to people, and I apologize for that. I should've responded to your original message with these *technical* problems instead of just harping on the idea that it looks like you weren't using all of the available statistics properly. I'm trying to chillax and get back to being a fun (albeit grumpy) productive hacker dude. Sorry, all. -Dormando
Re: stats help
Dustin wrote: On Jul 24, 10:39 pm, Spike bchhedaf...@gmail.com wrote: hi, i am newbie to memcached. I need help in finding how to get throughput stat. I want to see how much throughput memcache is getting. stats command does not list any stat for through put (requests per sec). Any idea on how to go about getting that info? Does memcache keep track of this information? It's rare to keep derived stats like that in general. It's usually not interesting. Do you want average requests per second over the lifetime of the process? Over the last second?, 60 seconds? 300, 900, 3600, etc... Most of the time, this is easily observable from the outside. Collect counters -- wait a bit, collect them again, then do your own math. That'll give you exactly what you want. It's a bit off topic for this list, but does anyone know if there are good generic tools for that? There are quite a few designed to convert SNMP 'COUNTER' types to rates, check thresholds and keep history to graph the trends, but usually the SNMP sampling is closely coupled to the rest of the logic. I think OpenNMS might do it with values it can pick up with http requests but I'm not sure how well it handles the spikes that would appear from restarts and value rollovers. -- Les Mikesell lesmikes...@gmail.com
Re: stats help
I know rrdtool (and the jrobin equivalent in java) can do it, but that's a fairly low level tool. I was hoping to find some generic framework that could accept either counter or gauge type values and do the rest for you including a web graph display. I'd think this would be a common problem but I haven't found any high-level tools that aren't married to snmp for the input. -Les Gavin M. Roy wrote: I use RRDTool for this with derive counter types. Collect the data you want from the stats command and use rrdtool to store the data every minute, graph it out with rrdtool graph and you'll get your trended stats. hi, i am newbie to memcached. I need help in finding how to get throughput stat. I want to see how much throughput memcache is getting. stats command does not list any stat for through put (requests per sec). Any idea on how to go about getting that info? Does memcache keep track of this information? It's rare to keep derived stats like that in general. It's usually not interesting. Do you want average requests per second over the lifetime of the process? Over the last second?, 60 seconds? 300, 900, 3600, etc... Most of the time, this is easily observable from the outside. Collect counters -- wait a bit, collect them again, then do your own math. That'll give you exactly what you want. It's a bit off topic for this list, but does anyone know if there are good generic tools for that? There are quite a few designed to convert SNMP 'COUNTER' types to rates, check thresholds and keep history to graph the trends, but usually the SNMP sampling is closely coupled to the rest of the logic. I think OpenNMS might do it with values it can pick up with http requests but I'm not sure how well it handles the spikes that would appear from restarts and value rollovers. -- Les Mikesell lesmikes...@gmail.com mailto:lesmikes...@gmail.com
Re: Using PCIe SSDs instead of RAM
Thanks for an explanation. I see that we have entirely different points of view, probably caused by totally different identified sets of bottlenecks, different usage, different configurations etc (I assume that you have greater experience, since my is restricted to one company, with just 55 memcache machines). For example you often say about the locks and CPU usage, while we observed that (not surprisingly to us) those O(1) operations, are relatively insignificant compared to socket operations which take ages. I agree that 16 extra bytes is a serious problem though. If I had time I would definitely try to implement a version that uses just 8 bytes or less (for example by reimplementing TTL buckets as an array of pointers to items hashed by item address). This was just a proof of concept, that you can have GC in O(1), which some ppl claimed to be difficult, which turned out to work very well for us at nk.pl. Sorry for tread hijacking, and all. On Sun, Jul 25, 2010 at 12:46 PM, dormando dorma...@rydia.net wrote: On Fri, Jul 23, 2010 at 8:47 AM, dormando dorma...@rydia.net wrote: I tried. Try the engine branch? I guess, I'll have to at some point. Just wanted to say, that LRU was designed as an algorithm for a uniform cost model, where all elements are almost equally important (have the same cost of miss) and the only thing that distinguishes them is the pattern of accesses. This is clearly not a good model for memcache, where: some elements are totally unimportant as they have already expired, some elements are larger than the others, some are always processed in batches (multigets), and so on. In my opinion GC moves the reality closer to the model, by removing unimportant elements, so if you want LRU to work correctly you should at least perform GC. You could also try to modify LRU to model that one large item actually occupies space that could be better utilies by several small elements (this is also a simple change). If you fill comfortable without GC, I am OK with that, just do not suggest, that GC is against LRU. Alright, I'm sorry. I've been unfair to you (and a few others recently). I've been unnecessarily grumpy. I tried to explain myself as fairly as possible, and Dustin added the words that I apparently forgot already, in that these things are better pressed through via SE's. I get annoyed by these threads because: - I really don't care for arguments on this level. When I said GC goes against the LRU I mean that the LRU we have doesn't require GC. The whole point of adding the LRU was so we could skip that part. I'm describing *intent*, I'm just too tired to keep arguing these things. - The thread hijacking is seriously annoying. If you want to ping us about an ignored patch, start a new thread or necro your own old thread. :( - Your original e-mail opened with We run this in single threaded mode and the performance is good enough for us so please merge it. I'm pretty dumbfounded that people can take a project which is supposed to be the performant underpinnings of the entire bloody internet and not do any sort of performance testing. I try to test things and I do have some hardware on hand but I'm still trying to find the motivation in myself to do a thorough performance run through of the engine branch. There's a lot of stuff going on in there. This is time consuming and often frustrating work. You did make a good attempt at building an efficient implementation, and it's a very clever way to go about the business, but best case: - You're adding logic to the most central global lock - You're adding 16 bytes per object - Plus some misc memory overhead (minor). If they're not causing the locks to be problems, the memory efficiency drop is an issue for many more people. If we make changes to the memory requirements of the default engine, I really only want to entertain ideas that make it *drop* requirements (we have some, need to start testing them as the engine stuff gets out there). The big picture is many users have small items, and if we push this change many people will suffer. Yes it's true that once those metrics expose an issue you technically already have an issue, but it's not an instant dropoff. Easily calculable with graphs and things like the evicted_time stats. Items dropping off the end that haven't been touched in 365,000+ seconds aren't likely to cause you a problem tomorrow or even next week, but watch for that number to fall. This is also why the evicted and evicted_nonzero stats were split. Eviction of an item with a 0 expiration is nearly meaningless. However, I can't seem to get this through without being rude to people, and I apologize for that. I should've responded to your original message with these *technical* problems instead of just harping on the idea that it looks like you weren't using all of the available statistics properly. I'm trying to chillax and get back to
Re: Using PCIe SSDs instead of RAM
On Sun, 25 Jul 2010, Jakub Łopuszański wrote: Thanks for an explanation. I see that we have entirely different points of view, probably caused by totally different identified sets of bottlenecks, different usage, different configurations etc (I assume that you have greater experience, since my is restricted to one company, with just 55 memcache machines). For example you often say about the locks and CPU usage, while we observed that (not surprisingly to us) those O(1) operations, are relatively insignificant compared to socket operations which take ages. I agree that 16 extra bytes is a serious problem though. If I had time I would definitely try to implement a version that uses just 8 bytes or less (for example by reimplementing TTL buckets as an array of pointers to items hashed by item address). This was just a proof of concept, that you can have GC in O(1), which some ppl claimed to be difficult, which turned out to work very well for us at nk.pl. Sorry for tread hijacking, and all. It's not hard to make it work, it's hard to make it work for everyone. There're lots of things that I could add to memcached in a day each, but it would make it less accessable instead of more accessable at the end of the day.
Re: stats help
hi, i am newbie to memcached. I need help in finding how to get throughput stat. I want to see how much throughput memcache is getting. stats command does not list any stat for through put (requests per sec). Any idea on how to go about getting that info? Does memcache keep track of this information? Thanks for help! Hey Spike, I've actually put together a plugin for Munin which does this, as well as graph information on a per slab level. It covers at least 90% of all information that memcached will give in stats in some manner. This plugin requires memcached v.1.4.2+ ... If you are running memcached v1.2.x branch let me know and I can point you to my older plugin. You can find the memcached plugin for munin I created here at: http://exchange.munin-monitoring.org/plugins/memcached-multigraph/details And you can find the latest version of munin at: http://munin-monitoring.org/ Granted this brings another piece of software into the mix, but you should be running something similar to this in a production environment so you can properly track a system / environment's health as it grows. I haven't had much time to work on expanding the plugin even more, but I hope to have some time to commit some new changes / expand its abilities even more in the next few weeks. Ciao, -- Matt West mw...@zynga.com
write failure response from delete
Hi, I'm using memcached via the ruby client library. Occasionally when I do a delete I get a WriteFailure response. What could cause this? How should I handle this situation (the library already attempt several retries)? Any suggestions would be welcome. Dan
Re: stats help
Plug plug: http://code.google.com/p/memcache-top/ Nicholas