Re: stats help

2010-07-25 Thread Dustin

On Jul 24, 10:39 pm, Spike bchhedaf...@gmail.com wrote:
 hi, i am newbie to memcached. I need help in finding how to get
 throughput stat.

 I want to see how much throughput memcache is getting. stats command
 does not list any stat for through put (requests per sec). Any idea on
 how to go about getting that info? Does memcache keep track of this
 information?

  It's rare to keep derived stats like that in general.  It's usually
not interesting.  Do you want average requests per second over the
lifetime of the process?  Over the last second?, 60 seconds? 300, 900,
3600, etc...

  Most of the time, this is easily observable from the outside.
Collect counters -- wait a bit, collect them again, then do your own
math.  That'll give you exactly what you want.


Re: stats help

2010-07-25 Thread Olga Khenkin
Hi,

Take stats every minute, calculate the delta - and you have throughput per
minute.

On Sun, Jul 25, 2010 at 8:39 AM, Spike bchhedaf...@gmail.com wrote:

 hi, i am newbie to memcached. I need help in finding how to get
 throughput stat.

 I want to see how much throughput memcache is getting. stats command
 does not list any stat for through put (requests per sec). Any idea on
 how to go about getting that info? Does memcache keep track of this
 information?

 Thanks for help!




Re: Using PCIe SSDs instead of RAM

2010-07-25 Thread dormando
 On Fri, Jul 23, 2010 at 8:47 AM, dormando dorma...@rydia.net wrote:
   I tried.

   Try the engine branch?

 I guess, I'll have to at some point.

 Just wanted to say, that LRU was designed as an algorithm for a uniform cost 
 model, where all elements are almost equally important (have the same cost of 
 miss) and the only thing that distinguishes them is
 the pattern of accesses. This is clearly not a good model for memcache, 
 where: some elements are totally unimportant as they have already expired, 
 some elements are larger than the others, some are always
 processed in batches (multigets), and so on. In my opinion GC moves the 
 reality closer to the model, by removing unimportant elements, so if you want 
 LRU to work correctly you should at least perform GC.
 You could also try to modify LRU to model that one large item actually 
 occupies space that could be better utilies by several small elements (this 
 is also a simple change). If you fill comfortable without
 GC, I am OK with that, just do not suggest, that GC is against LRU.

Alright, I'm sorry. I've been unfair to you (and a few others recently).
I've been unnecessarily grumpy. I tried to explain myself as fairly as
possible, and Dustin added the words that I apparently forgot already, in
that these things are better pressed through via SE's.

I get annoyed by these threads because:

- I really don't care for arguments on this level. When I said GC goes
against the LRU I mean that the LRU we have doesn't require GC. The whole
point of adding the LRU was so we could skip that part. I'm describing
*intent*, I'm just too tired to keep arguing these things.

- The thread hijacking is seriously annoying. If you want to ping us about
an ignored patch, start a new thread or necro your own old thread. :(

- Your original e-mail opened with We run this in single threaded mode
and the performance is good enough for us so please merge it. I'm pretty
dumbfounded that people can take a project which is supposed to be the
performant underpinnings of the entire bloody internet and not do any sort
of performance testing.

I try to test things and I do have some hardware on hand but I'm still
trying to find the motivation in myself to do a thorough performance
run through of the engine branch. There's a lot of stuff going on in
there. This is time consuming and often frustrating work.

You did make a good attempt at building an efficient implementation, and
it's a very clever way to go about the business, but best case:

- You're adding logic to the most central global lock
- You're adding 16 bytes per object
- Plus some misc memory overhead (minor).

If they're not causing the locks to be problems, the memory efficiency
drop is an issue for many more people. If we make changes to the memory
requirements of the default engine, I really only want to entertain ideas
that make it *drop* requirements (we have some, need to start testing
them as the engine stuff gets out there).

The big picture is many users have small items, and if we push this change
many people will suffer.

Yes it's true that once those metrics expose an issue you technically
already have an issue, but it's not an instant dropoff. Easily calculable
with graphs and things like the evicted_time stats. Items dropping off
the end that haven't been touched in 365,000+ seconds aren't likely to
cause you a problem tomorrow or even next week, but watch for that number
to fall. This is also why the evicted and evicted_nonzero stats were
split. Eviction of an item with a 0 expiration is nearly meaningless.

However, I can't seem to get this through without being rude to people,
and I apologize for that. I should've responded to your original message
with these *technical* problems instead of just harping on the idea that
it looks like you weren't using all of the available statistics properly.

I'm trying to chillax and get back to being a fun (albeit grumpy)
productive hacker dude. Sorry, all.

-Dormando


Re: stats help

2010-07-25 Thread Les Mikesell

Dustin wrote:

On Jul 24, 10:39 pm, Spike bchhedaf...@gmail.com wrote:

hi, i am newbie to memcached. I need help in finding how to get
throughput stat.

I want to see how much throughput memcache is getting. stats command
does not list any stat for through put (requests per sec). Any idea on
how to go about getting that info? Does memcache keep track of this
information?


  It's rare to keep derived stats like that in general.  It's usually
not interesting.  Do you want average requests per second over the
lifetime of the process?  Over the last second?, 60 seconds? 300, 900,
3600, etc...

  Most of the time, this is easily observable from the outside.
Collect counters -- wait a bit, collect them again, then do your own
math.  That'll give you exactly what you want.


It's a bit off topic for this list, but does anyone know if there are good 
generic tools for that?  There are quite a few designed to convert SNMP 
'COUNTER' types to rates, check thresholds and keep history to graph the trends, 
but usually the SNMP sampling is closely coupled to the rest of the logic.  I 
think OpenNMS might do it with values it can pick up with http requests but I'm 
not sure how well it handles the spikes that would appear from restarts and 
value rollovers.


--
  Les Mikesell
   lesmikes...@gmail.com



Re: stats help

2010-07-25 Thread Les Mikesell
I know rrdtool (and the jrobin equivalent in java) can do it, but that's a 
fairly low level tool.  I was hoping to find some generic framework that could 
accept either counter or gauge type values and do the rest for you including a 
web graph display.  I'd think this would be a common problem but I haven't found 
any high-level tools that aren't married to snmp for the input.


 -Les


Gavin M. Roy wrote:
I use RRDTool for this with derive counter types.  Collect the data you 
want from the stats command and use rrdtool to store the data every 
minute, graph it out with rrdtool graph and you'll get your trended stats.





hi, i am newbie to memcached. I need help in finding how to get
throughput stat.

I want to see how much throughput memcache is getting.
stats command
does not list any stat for through put (requests per sec).
Any idea on
how to go about getting that info? Does memcache keep track
of this
information?


 It's rare to keep derived stats like that in general.  It's usually
not interesting.  Do you want average requests per second over the
lifetime of the process?  Over the last second?, 60 seconds?
300, 900,
3600, etc...

 Most of the time, this is easily observable from the outside.
Collect counters -- wait a bit, collect them again, then do your own
math.  That'll give you exactly what you want.


It's a bit off topic for this list, but does anyone know if there
are good generic tools for that?  There are quite a few designed to
convert SNMP 'COUNTER' types to rates, check thresholds and keep
history to graph the trends, but usually the SNMP sampling is
closely coupled to the rest of the logic.  I think OpenNMS might do
it with values it can pick up with http requests but I'm not sure
how well it handles the spikes that would appear from restarts and
value rollovers.

-- 
 Les Mikesell

  lesmikes...@gmail.com mailto:lesmikes...@gmail.com






Re: Using PCIe SSDs instead of RAM

2010-07-25 Thread Jakub Łopuszański
Thanks for an explanation.

I see that we have entirely different points of view, probably caused by
totally different identified sets of bottlenecks, different usage, different
configurations etc (I assume that you have greater experience, since my is
restricted to one company, with just 55 memcache machines). For example you
often say about the locks and CPU usage, while we observed that (not
surprisingly to us) those O(1) operations, are relatively insignificant
compared to socket operations which take ages.

I agree that 16 extra bytes is a serious problem though. If I had time I
would definitely try to implement a version that uses just 8 bytes or less
(for example by reimplementing TTL buckets as an array of pointers to items
hashed by item address). This was just a proof of concept, that you can have
GC in O(1), which some ppl claimed to be difficult, which turned out to work
very well for us at nk.pl.

Sorry for tread hijacking, and all.

On Sun, Jul 25, 2010 at 12:46 PM, dormando dorma...@rydia.net wrote:

  On Fri, Jul 23, 2010 at 8:47 AM, dormando dorma...@rydia.net wrote:
I tried.
 
Try the engine branch?
 
  I guess, I'll have to at some point.
 
  Just wanted to say, that LRU was designed as an algorithm for a uniform
 cost model, where all elements are almost equally important (have the same
 cost of miss) and the only thing that distinguishes them is
  the pattern of accesses. This is clearly not a good model for memcache,
 where: some elements are totally unimportant as they have already expired,
 some elements are larger than the others, some are always
  processed in batches (multigets), and so on. In my opinion GC moves the
 reality closer to the model, by removing unimportant elements, so if you
 want LRU to work correctly you should at least perform GC.
  You could also try to modify LRU to model that one large item actually
 occupies space that could be better utilies by several small elements (this
 is also a simple change). If you fill comfortable without
  GC, I am OK with that, just do not suggest, that GC is against LRU.

 Alright, I'm sorry. I've been unfair to you (and a few others recently).
 I've been unnecessarily grumpy. I tried to explain myself as fairly as
 possible, and Dustin added the words that I apparently forgot already, in
 that these things are better pressed through via SE's.

 I get annoyed by these threads because:

 - I really don't care for arguments on this level. When I said GC goes
 against the LRU I mean that the LRU we have doesn't require GC. The whole
 point of adding the LRU was so we could skip that part. I'm describing
 *intent*, I'm just too tired to keep arguing these things.

 - The thread hijacking is seriously annoying. If you want to ping us about
 an ignored patch, start a new thread or necro your own old thread. :(

 - Your original e-mail opened with We run this in single threaded mode
 and the performance is good enough for us so please merge it. I'm pretty
 dumbfounded that people can take a project which is supposed to be the
 performant underpinnings of the entire bloody internet and not do any sort
 of performance testing.

 I try to test things and I do have some hardware on hand but I'm still
 trying to find the motivation in myself to do a thorough performance
 run through of the engine branch. There's a lot of stuff going on in
 there. This is time consuming and often frustrating work.

 You did make a good attempt at building an efficient implementation, and
 it's a very clever way to go about the business, but best case:

 - You're adding logic to the most central global lock
 - You're adding 16 bytes per object
 - Plus some misc memory overhead (minor).

 If they're not causing the locks to be problems, the memory efficiency
 drop is an issue for many more people. If we make changes to the memory
 requirements of the default engine, I really only want to entertain ideas
 that make it *drop* requirements (we have some, need to start testing
 them as the engine stuff gets out there).

 The big picture is many users have small items, and if we push this change
 many people will suffer.

 Yes it's true that once those metrics expose an issue you technically
 already have an issue, but it's not an instant dropoff. Easily calculable
 with graphs and things like the evicted_time stats. Items dropping off
 the end that haven't been touched in 365,000+ seconds aren't likely to
 cause you a problem tomorrow or even next week, but watch for that number
 to fall. This is also why the evicted and evicted_nonzero stats were
 split. Eviction of an item with a 0 expiration is nearly meaningless.

 However, I can't seem to get this through without being rude to people,
 and I apologize for that. I should've responded to your original message
 with these *technical* problems instead of just harping on the idea that
 it looks like you weren't using all of the available statistics properly.

 I'm trying to chillax and get back to 

Re: Using PCIe SSDs instead of RAM

2010-07-25 Thread dormando


On Sun, 25 Jul 2010, Jakub Łopuszański wrote:

 Thanks for an explanation.
 I see that we have entirely different points of view, probably caused by 
 totally different identified sets of bottlenecks, different
 usage, different configurations etc (I assume that you have greater 
 experience, since my is restricted to one company, with just 55
 memcache machines). For example you often say about the locks and CPU usage, 
 while we observed that (not surprisingly to us) those O(1)
 operations, are relatively insignificant compared to socket operations which 
 take ages. 

 I agree that 16 extra bytes is a serious problem though. If I had time I 
 would definitely try to implement a version that uses just 8
 bytes or less (for example by reimplementing TTL buckets as an array of 
 pointers to items hashed by item address). This was just a proof
 of concept, that you can have GC in O(1), which some ppl claimed to be 
 difficult, which turned out to work very well for us at nk.pl.

 Sorry for tread hijacking, and all.

It's not hard to make it work, it's hard to make it work for everyone.
There're lots of things that I could add to memcached in a day each, but
it would make it less accessable instead of more accessable at the end of
the day.


Re: stats help

2010-07-25 Thread Matthew West
 hi, i am newbie to memcached. I need help in finding how to get
 throughput stat.
 
 I want to see how much throughput memcache is getting. stats command
 does not list any stat for through put (requests per sec). Any idea on
 how to go about getting that info? Does memcache keep track of this
 information?
 
 Thanks for help!
 

Hey Spike,

   I've actually put together a plugin for Munin which does this, as well as
graph information on a per slab level. It covers at least 90% of all
information that memcached will give in stats in some manner. This plugin
requires memcached v.1.4.2+ ... If you are running memcached v1.2.x branch
let me know and I can point you to my older plugin.

You can find the memcached plugin for munin I created here at:
http://exchange.munin-monitoring.org/plugins/memcached-multigraph/details

And you can find the latest version of munin at:
http://munin-monitoring.org/

Granted this brings another piece of software into the mix, but you should
be running something similar to this in a production environment so you can
properly track a system / environment's health as it grows.

I haven't had much time to work on expanding the plugin even more, but I
hope to have some time to commit some new changes / expand its abilities
even more in the next few weeks.

Ciao,

-- 

Matt West
mw...@zynga.com



write failure response from delete

2010-07-25 Thread daniel hoey
Hi,

I'm using memcached via the ruby client library. Occasionally when I
do a delete I get a WriteFailure response. What could cause this?
How should I handle this situation (the library already attempt
several retries)? Any suggestions would be welcome.

Dan


Re: stats help

2010-07-25 Thread ntang
Plug plug:

http://code.google.com/p/memcache-top/

Nicholas