Re: Using PCIe SSDs instead of RAM

2010-07-25 Thread dormando
 On Fri, Jul 23, 2010 at 8:47 AM, dormando dorma...@rydia.net wrote:
   I tried.

   Try the engine branch?

 I guess, I'll have to at some point.

 Just wanted to say, that LRU was designed as an algorithm for a uniform cost 
 model, where all elements are almost equally important (have the same cost of 
 miss) and the only thing that distinguishes them is
 the pattern of accesses. This is clearly not a good model for memcache, 
 where: some elements are totally unimportant as they have already expired, 
 some elements are larger than the others, some are always
 processed in batches (multigets), and so on. In my opinion GC moves the 
 reality closer to the model, by removing unimportant elements, so if you want 
 LRU to work correctly you should at least perform GC.
 You could also try to modify LRU to model that one large item actually 
 occupies space that could be better utilies by several small elements (this 
 is also a simple change). If you fill comfortable without
 GC, I am OK with that, just do not suggest, that GC is against LRU.

Alright, I'm sorry. I've been unfair to you (and a few others recently).
I've been unnecessarily grumpy. I tried to explain myself as fairly as
possible, and Dustin added the words that I apparently forgot already, in
that these things are better pressed through via SE's.

I get annoyed by these threads because:

- I really don't care for arguments on this level. When I said GC goes
against the LRU I mean that the LRU we have doesn't require GC. The whole
point of adding the LRU was so we could skip that part. I'm describing
*intent*, I'm just too tired to keep arguing these things.

- The thread hijacking is seriously annoying. If you want to ping us about
an ignored patch, start a new thread or necro your own old thread. :(

- Your original e-mail opened with We run this in single threaded mode
and the performance is good enough for us so please merge it. I'm pretty
dumbfounded that people can take a project which is supposed to be the
performant underpinnings of the entire bloody internet and not do any sort
of performance testing.

I try to test things and I do have some hardware on hand but I'm still
trying to find the motivation in myself to do a thorough performance
run through of the engine branch. There's a lot of stuff going on in
there. This is time consuming and often frustrating work.

You did make a good attempt at building an efficient implementation, and
it's a very clever way to go about the business, but best case:

- You're adding logic to the most central global lock
- You're adding 16 bytes per object
- Plus some misc memory overhead (minor).

If they're not causing the locks to be problems, the memory efficiency
drop is an issue for many more people. If we make changes to the memory
requirements of the default engine, I really only want to entertain ideas
that make it *drop* requirements (we have some, need to start testing
them as the engine stuff gets out there).

The big picture is many users have small items, and if we push this change
many people will suffer.

Yes it's true that once those metrics expose an issue you technically
already have an issue, but it's not an instant dropoff. Easily calculable
with graphs and things like the evicted_time stats. Items dropping off
the end that haven't been touched in 365,000+ seconds aren't likely to
cause you a problem tomorrow or even next week, but watch for that number
to fall. This is also why the evicted and evicted_nonzero stats were
split. Eviction of an item with a 0 expiration is nearly meaningless.

However, I can't seem to get this through without being rude to people,
and I apologize for that. I should've responded to your original message
with these *technical* problems instead of just harping on the idea that
it looks like you weren't using all of the available statistics properly.

I'm trying to chillax and get back to being a fun (albeit grumpy)
productive hacker dude. Sorry, all.

-Dormando


Re: Using PCIe SSDs instead of RAM

2010-07-25 Thread Jakub Łopuszański
Thanks for an explanation.

I see that we have entirely different points of view, probably caused by
totally different identified sets of bottlenecks, different usage, different
configurations etc (I assume that you have greater experience, since my is
restricted to one company, with just 55 memcache machines). For example you
often say about the locks and CPU usage, while we observed that (not
surprisingly to us) those O(1) operations, are relatively insignificant
compared to socket operations which take ages.

I agree that 16 extra bytes is a serious problem though. If I had time I
would definitely try to implement a version that uses just 8 bytes or less
(for example by reimplementing TTL buckets as an array of pointers to items
hashed by item address). This was just a proof of concept, that you can have
GC in O(1), which some ppl claimed to be difficult, which turned out to work
very well for us at nk.pl.

Sorry for tread hijacking, and all.

On Sun, Jul 25, 2010 at 12:46 PM, dormando dorma...@rydia.net wrote:

  On Fri, Jul 23, 2010 at 8:47 AM, dormando dorma...@rydia.net wrote:
I tried.
 
Try the engine branch?
 
  I guess, I'll have to at some point.
 
  Just wanted to say, that LRU was designed as an algorithm for a uniform
 cost model, where all elements are almost equally important (have the same
 cost of miss) and the only thing that distinguishes them is
  the pattern of accesses. This is clearly not a good model for memcache,
 where: some elements are totally unimportant as they have already expired,
 some elements are larger than the others, some are always
  processed in batches (multigets), and so on. In my opinion GC moves the
 reality closer to the model, by removing unimportant elements, so if you
 want LRU to work correctly you should at least perform GC.
  You could also try to modify LRU to model that one large item actually
 occupies space that could be better utilies by several small elements (this
 is also a simple change). If you fill comfortable without
  GC, I am OK with that, just do not suggest, that GC is against LRU.

 Alright, I'm sorry. I've been unfair to you (and a few others recently).
 I've been unnecessarily grumpy. I tried to explain myself as fairly as
 possible, and Dustin added the words that I apparently forgot already, in
 that these things are better pressed through via SE's.

 I get annoyed by these threads because:

 - I really don't care for arguments on this level. When I said GC goes
 against the LRU I mean that the LRU we have doesn't require GC. The whole
 point of adding the LRU was so we could skip that part. I'm describing
 *intent*, I'm just too tired to keep arguing these things.

 - The thread hijacking is seriously annoying. If you want to ping us about
 an ignored patch, start a new thread or necro your own old thread. :(

 - Your original e-mail opened with We run this in single threaded mode
 and the performance is good enough for us so please merge it. I'm pretty
 dumbfounded that people can take a project which is supposed to be the
 performant underpinnings of the entire bloody internet and not do any sort
 of performance testing.

 I try to test things and I do have some hardware on hand but I'm still
 trying to find the motivation in myself to do a thorough performance
 run through of the engine branch. There's a lot of stuff going on in
 there. This is time consuming and often frustrating work.

 You did make a good attempt at building an efficient implementation, and
 it's a very clever way to go about the business, but best case:

 - You're adding logic to the most central global lock
 - You're adding 16 bytes per object
 - Plus some misc memory overhead (minor).

 If they're not causing the locks to be problems, the memory efficiency
 drop is an issue for many more people. If we make changes to the memory
 requirements of the default engine, I really only want to entertain ideas
 that make it *drop* requirements (we have some, need to start testing
 them as the engine stuff gets out there).

 The big picture is many users have small items, and if we push this change
 many people will suffer.

 Yes it's true that once those metrics expose an issue you technically
 already have an issue, but it's not an instant dropoff. Easily calculable
 with graphs and things like the evicted_time stats. Items dropping off
 the end that haven't been touched in 365,000+ seconds aren't likely to
 cause you a problem tomorrow or even next week, but watch for that number
 to fall. This is also why the evicted and evicted_nonzero stats were
 split. Eviction of an item with a 0 expiration is nearly meaningless.

 However, I can't seem to get this through without being rude to people,
 and I apologize for that. I should've responded to your original message
 with these *technical* problems instead of just harping on the idea that
 it looks like you weren't using all of the available statistics properly.

 I'm trying to chillax and get back to 

Re: Using PCIe SSDs instead of RAM

2010-07-25 Thread dormando


On Sun, 25 Jul 2010, Jakub Łopuszański wrote:

 Thanks for an explanation.
 I see that we have entirely different points of view, probably caused by 
 totally different identified sets of bottlenecks, different
 usage, different configurations etc (I assume that you have greater 
 experience, since my is restricted to one company, with just 55
 memcache machines). For example you often say about the locks and CPU usage, 
 while we observed that (not surprisingly to us) those O(1)
 operations, are relatively insignificant compared to socket operations which 
 take ages. 

 I agree that 16 extra bytes is a serious problem though. If I had time I 
 would definitely try to implement a version that uses just 8
 bytes or less (for example by reimplementing TTL buckets as an array of 
 pointers to items hashed by item address). This was just a proof
 of concept, that you can have GC in O(1), which some ppl claimed to be 
 difficult, which turned out to work very well for us at nk.pl.

 Sorry for tread hijacking, and all.

It's not hard to make it work, it's hard to make it work for everyone.
There're lots of things that I could add to memcached in a day each, but
it would make it less accessable instead of more accessable at the end of
the day.


Re: Using PCIe SSDs instead of RAM

2010-07-23 Thread Jakub Łopuszański
While I agree with most of your thesis, I can't see how GC is against the
LRU.

I agree, that often accessed keys with short TTL seem strange, and so do
rarely accessed keys with long TTL. But there are lots of perfect reasons to
have such situation, and we do.
GC does not work against the LRU (at least I can't see it), it cooperates.
Apparently LRU is never used, because you have smaller chances to run out of
memory, but I'd like to answer doubts of Brian Moon: in case whole memory is
occupied you will not get sudden lack of memory, but just the usuall
thing: LRU will start to evict oldest items.
I agree that monitoring hitrates and evictions makes sens, but you can
forcast problems much sooner if you monitor number of unexpired items, as
well.
The point is: GC does not forbid you from using your regular monitoring
tools, skills and procedures. It just gives you another tool: live
monitoring of unexpired items.
I see nothing bad about it:)

Scenario 1. You are releasing new feature, and you want to scale the number
of servers accordingly to the load. You can monitor memory usage as the
users join, extrapolate, and order new machines much sooner, than by
monitoring evictions, as evictions indicate that you already have a problem.
Scenario 2. You need to steal machines from one cluster to help build
another one, and you have to decide if you can do so safely without risking
that the old cluster will run of memory. Again monitoring evictions can
not reliably tell you how many machines can you remove from the cluster,
while monitoring memory gives you perfectly accurate info.


On Fri, Jul 23, 2010 at 12:12 AM, dormando dorma...@rydia.net wrote:


 http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving

 Think I'll write a separate page about managing memory, based off of the
 slides from my mysqlconf presentation about monitoring memcached...

 We're not ignoring you, the patch is against what the LRU is designed for.
 Several people have argued to put garbage collection back into memcached,
 but it just doesn't mix.

 In the interest of being constructive, you should look back through the
 mailing list for details on the storage engine branch, and if you really
 want it to work, it'd be a good exercise to implement this as a custom
 storage engine.

 In the interest of being thorough; you proved your own patch unnecessary
 by noting that the hitrate did not change. It just confirmed you weren't
 having a problem.

 The short notes of my slides are just:

 - Note evictions over time
 - Note hitrate over time
 - Investigate changes to either via a traffic snapshot from maatkit,
 either on your memcached server or from an app server. Or setup one app
 server to log its memcached traffic. whatever you need to do.
 - Note your DB load as well, and correlate *all* of these numbers.

 You'll get way more useful information out of the *flow* through memcached
 than from *what's inside it*. What's inside it doesn't matter, at all!

 Keep your hitrate stable, investigate what your app is doing when it
 changes. If there's nothing for you to fix and the hitrate is dropping, db
 load is increasing, add more memcached servers. It's really really simple.
 Honestly! Looking at just one stat and making that decision is pretty
 weird.

 In your case, you were seeing evictions despite 50% of your memory being
 loaded with expired items. Neither of these things are a problem or even
 matter, because:

 - expired items are freed when they're fetched
 - evicted items are picked off of the tail of the LRU

 which means that *neither* the expired items or the evicted items are
 being accessed at all. You have unexpired items which are being accessed
 less frequently than stuff that's being expired!

 It *could* indicate a problem, but simply garbage collecting will actually
 *hide* it from you! You'll find it by analyzing your miss's and set's. You
 might then see that your app is uselessly setting hundreds of keys every
 time a user loads their profile, or frontpage, or whatever. Those keys
 then expire without ever being used again.

 That should lead you into a *real* benefit of not wasting time setting
 extraneous keys, or fetching keys that never exist, or finding places to
 combine data or issue multigets more correctly.

 With respect to your multiget note, I went over this in quite a bit of
 detail: http://dormando.livejournal.com/521163.html

 If you're multiget'ing related data, there's zero reason for it to hit
 more than one memcached instance. Except maybe you're fetching mass
 numbers of huge keys and it makes more sense for the TCP sessions to be
 split up in parallel. I dunno.

 In one final note, I'd really really appreciate it if you could stop
 hijacking threads to promote your patch. It's pretty rude, as your garbage
 collector issue has been discussed on the list several times.

 On Thu, 22 Jul 2010, Jakub Łopuszański wrote:

  Well, I beg to differ.
  We used to have evictions  0, 

Re: Using PCIe SSDs instead of RAM

2010-07-23 Thread dormando
I tried.

Try the engine branch?

On Fri, 23 Jul 2010, Jakub Łopuszański wrote:

 While I agree with most of your thesis, I can't see how GC is against the LRU.

 I agree, that often accessed keys with short TTL seem strange, and so do 
 rarely accessed keys with long TTL. But there are lots of perfect reasons to 
 have such situation, and we do.
 GC does not work against the LRU (at least I can't see it), it cooperates. 
 Apparently LRU is never used, because you have smaller chances to run out of 
 memory, but I'd like to answer doubts of Brian Moon:
 in case whole memory is occupied you will not get sudden lack of memory, 
 but just the usuall thing: LRU will start to evict oldest items.
 I agree that monitoring hitrates and evictions makes sens, but you can 
 forcast problems much sooner if you monitor number of unexpired items, as 
 well.
 The point is: GC does not forbid you from using your regular monitoring 
 tools, skills and procedures. It just gives you another tool: live monitoring 
 of unexpired items.
 I see nothing bad about it:)

 Scenario 1. You are releasing new feature, and you want to scale the number 
 of servers accordingly to the load. You can monitor memory usage as the users 
 join, extrapolate, and order new machines much
 sooner, than by monitoring evictions, as evictions indicate that you already 
 have a problem.
 Scenario 2. You need to steal machines from one cluster to help build another 
 one, and you have to decide if you can do so safely without risking that the 
 old cluster will run of memory. Again monitoring
 evictions can not reliably tell you how many machines can you remove from the 
 cluster, while monitoring memory gives you perfectly accurate info.


 On Fri, Jul 23, 2010 at 12:12 AM, dormando dorma...@rydia.net wrote:
   
 http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving

   Think I'll write a separate page about managing memory, based off of the
   slides from my mysqlconf presentation about monitoring memcached...

   We're not ignoring you, the patch is against what the LRU is designed 
 for.
   Several people have argued to put garbage collection back into 
 memcached,
   but it just doesn't mix.

   In the interest of being constructive, you should look back through the
   mailing list for details on the storage engine branch, and if you really
   want it to work, it'd be a good exercise to implement this as a custom
   storage engine.

   In the interest of being thorough; you proved your own patch unnecessary
   by noting that the hitrate did not change. It just confirmed you weren't
   having a problem.

   The short notes of my slides are just:

   - Note evictions over time
   - Note hitrate over time
   - Investigate changes to either via a traffic snapshot from maatkit,
   either on your memcached server or from an app server. Or setup one app
   server to log its memcached traffic. whatever you need to do.
   - Note your DB load as well, and correlate *all* of these numbers.

   You'll get way more useful information out of the *flow* through 
 memcached
   than from *what's inside it*. What's inside it doesn't matter, at all!

   Keep your hitrate stable, investigate what your app is doing when it
   changes. If there's nothing for you to fix and the hitrate is dropping, 
 db
   load is increasing, add more memcached servers. It's really really 
 simple.
   Honestly! Looking at just one stat and making that decision is pretty
   weird.

   In your case, you were seeing evictions despite 50% of your memory being
   loaded with expired items. Neither of these things are a problem or even
   matter, because:

   - expired items are freed when they're fetched
   - evicted items are picked off of the tail of the LRU

   which means that *neither* the expired items or the evicted items are
   being accessed at all. You have unexpired items which are being accessed
   less frequently than stuff that's being expired!

   It *could* indicate a problem, but simply garbage collecting will 
 actually
   *hide* it from you! You'll find it by analyzing your miss's and set's. 
 You
   might then see that your app is uselessly setting hundreds of keys every
   time a user loads their profile, or frontpage, or whatever. Those keys
   then expire without ever being used again.

   That should lead you into a *real* benefit of not wasting time setting
   extraneous keys, or fetching keys that never exist, or finding places to
   combine data or issue multigets more correctly.

   With respect to your multiget note, I went over this in quite a bit of
   detail: http://dormando.livejournal.com/521163.html

   If you're multiget'ing related data, there's zero reason for it to hit
   more than one memcached instance. Except maybe you're fetching mass
  

Re: Using PCIe SSDs instead of RAM

2010-07-23 Thread Jakub Łopuszański
On Fri, Jul 23, 2010 at 8:47 AM, dormando dorma...@rydia.net wrote:

 I tried.

 Try the engine branch?

 I guess, I'll have to at some point.

Just wanted to say, that LRU was designed as an algorithm for a uniform cost
model, where all elements are almost equally important (have the same cost
of miss) and the only thing that distinguishes them is the pattern of
accesses. This is clearly not a good model for memcache, where: some
elements are totally unimportant as they have already expired, some elements
are larger than the others, some are always processed in batches
(multigets), and so on. In my opinion GC moves the reality closer to the
model, by removing unimportant elements, so if you want LRU to work
correctly you should at least perform GC. You could also try to modify LRU
to model that one large item actually occupies space that could be better
utilies by several small elements (this is also a simple change). If you
fill comfortable without GC, I am OK with that, just do not suggest, that GC
is against LRU.


Re: Using PCIe SSDs instead of RAM

2010-07-23 Thread Ben Manes
There are alternatives to LRU, which is generally chosen for being extremely 
simple to implement, fast, and has a reasonable hit rate. The 
Greedy-Dual-Size-Frequency policy may be more appropriate for memcached as it 
accounts a value's weight. I doubt that there's a lot of value of changing the 
current design, but there are alternatives to approaches that would need to be 
considered if GC was a serious consideration.





From: Jakub Łopuszański jakub.lopuszan...@nasza-klasa.pl
To: memcached@googlegroups.com
Sent: Fri, July 23, 2010 12:16:16 AM
Subject: Re: Using PCIe SSDs instead of RAM




On Fri, Jul 23, 2010 at 8:47 AM, dormando dorma...@rydia.net wrote:

I tried.

Try the engine branch?



I guess, I'll have to at some point.

Just wanted to say, that LRU was designed as an algorithm for a uniform cost 
model, where all elements are almost equally important (have the same cost of 
miss) and the only thing that distinguishes them is the pattern of accesses. 
This is clearly not a good model for memcache, where: some elements are totally 
unimportant as they have already expired, some elements are larger than the 
others, some are always processed in batches (multigets), and so on. In my 
opinion GC moves the reality closer to the model, by removing unimportant 
elements, so if you want LRU to work correctly you should at least perform GC. 
You could also try to modify LRU to model that one large item actually occupies 
space that could be better utilies by several small elements (this is also a 
simple change). If you fill comfortable without GC, I am OK with that, just do 
not suggest, that GC is against LRU.



  

Re: Using PCIe SSDs instead of RAM

2010-07-23 Thread Dustin

On Jul 23, 11:31 am, Ben Manes ben_ma...@yahoo.com wrote:
 There are alternatives to LRU, which is generally chosen for being extremely
 simple to implement, fast, and has a reasonable hit rate. The
 Greedy-Dual-Size-Frequency policy may be more appropriate for memcached as it
 accounts a value's weight. I doubt that there's a lot of value of changing the
 current design, but there are alternatives to approaches that would need to be
 considered if GC was a serious consideration.

  An engine that does this would be welcome.  :)

  A big reason storage engines were introduced a while back was so
that people with different theories of operation could could implement
new storage or eviction models and have them maintain relevance as the
memcached core itself progresses forward.

  There's nobody to say you can't have your own engine for people to
try out (and perhaps even have excellent luck in different
environments), and if/when a universally better model arises, we can
change defaults.


Re: Using PCIe SSDs instead of RAM

2010-07-22 Thread Jakub Łopuszański
I see that my patch for garbage collection is still being ignored, and your
post gives me some idea about why it is so.
I think that RAM is a real problem, because currently (without GC) you have
no clue about how much RAM you really need. So you can end up blindly buying
more and more machines, which effectively means that multiget works worse
and worse (client issues one big multiget but it gets split into many
packets to many servers).
Currently we try to get number of servers in the cluster smaller based on
the reall consumption to get more from multiget feature.

So I believe that there is an important connection between RAM and speed,
and this connection is number of servers in the cluster.

On Wed, Jul 21, 2010 at 1:51 PM, Guille -bisho- bishi...@gmail.com wrote:

 Many memcache users are more interested in latency than in huge
 amounts of memory to cache. The drive you mention is 26µs [1] compared
 with ~22.5 ns [2], 3 orders of magnitude more.

 If your application is accesing a lot of small memcache data to
 process a page, the increased latency will be noticed. If it's just
 for caching just full content (full pages) might be interesting, but
 then you might be more interested in something like varnish that uses
 regular disks and memory as cache.

 Whats is your use case?

 [1] http://www.fusionio.com/products/iodrive/?tab=specs
 [2]
 http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Memory_timing

 --
 Guille -ℬḭṩḩø- bishi...@gmail.com
 :wq



Re: Using PCIe SSDs instead of RAM

2010-07-22 Thread Brian Moon

On 7/22/10 5:46 AM, Jakub Łopuszański wrote:

I see that my patch for garbage collection is still being ignored, and
your post gives me some idea about why it is so.
I think that RAM is a real problem, because currently (without GC) you
have no clue about how much RAM you really need. So you can end up
blindly buying more and more machines, which effectively means that
multiget works worse and worse (client issues one big multiget but it
gets split into many packets to many servers).
Currently we try to get number of servers in the cluster smaller based
on the reall consumption to get more from multiget feature.


I would never, never, never want my memcached daemon ram usage to 
fluctuate wildly. Eviction rate is a much better determination of how 
well your cache is being used.


--

Brian.

http://brian.moonspot.net/


Re: Using PCIe SSDs instead of RAM

2010-07-22 Thread Jakub Łopuszański
Well, I beg to differ.
We used to have evictions  0, actually around 200 (per whatever munin
counts them), so we used to think, that we have too small number of
machines, and kept adding them.
After using the patch, the memory usage dropped by 80%, and we have no
evictions since a long time, which means, that evictions where misleading,
and happened just because LRU sometimes kills fresh items, even though there
are lots of outdated keys.

Moreover it's not like RAM usage fluctuates wildly. It's kind of constant,
or at least periodic, so you can very accurately say if something bad
happened, as it would be instantly visible as a deviation from yesterday's
charts. Before applying the patch, you could as well not look at the chart
at all, as it was more than sure that it always shows 100% usage, which in
my opinion gives no clue about what is actually going on.

Even if you are afraid of wildly fluctuating charts, you will not solve
the problem by hiding it, and this is what actually happens if you don't
have GC -- the traffic, the number of outdated keys, they all fluctuate, but
you just don't see it, if the chart always shows 100% usage...

2010/7/22 Brian Moon br...@moonspot.net

 On 7/22/10 5:46 AM, Jakub Łopuszański wrote:

 I see that my patch for garbage collection is still being ignored, and
 your post gives me some idea about why it is so.
 I think that RAM is a real problem, because currently (without GC) you
 have no clue about how much RAM you really need. So you can end up
 blindly buying more and more machines, which effectively means that
 multiget works worse and worse (client issues one big multiget but it
 gets split into many packets to many servers).
 Currently we try to get number of servers in the cluster smaller based
 on the reall consumption to get more from multiget feature.


 I would never, never, never want my memcached daemon ram usage to fluctuate
 wildly. Eviction rate is a much better determination of how well your cache
 is being used.

 --

 Brian.
 
 http://brian.moonspot.net/



Re: Using PCIe SSDs instead of RAM

2010-07-22 Thread Brian Moon

On 7/22/10 2:02 PM, Jakub Łopuszański wrote:

Well, I beg to differ.
We used to have evictions  0, actually around 200 (per whatever munin
counts them), so we used to think, that we have too small number of
machines, and kept adding them.
After using the patch, the memory usage dropped by 80%, and we have no
evictions since a long time, which means, that evictions where
misleading, and happened just because LRU sometimes kills fresh items,
even though there are lots of outdated keys.


Let me make sure I understand your claim here. You are claiming that the 
LRU is evicting things even though there are expired items in the slabs? 
And that expired items are left in the slabs and non-expired items are 
removed from the slab by the LRU? That is your claim? I just want to be 
clear.



Moreover it's not like RAM usage fluctuates wildly. It's kind of
constant, or at least periodic, so you can very accurately say if
something bad happened, as it would be instantly visible as a deviation
from yesterday's charts. Before applying the patch, you could as well
not look at the chart at all, as it was more than sure that it always
shows 100% usage, which in my opinion gives no clue about what is
actually going on.

Even if you are afraid of wildly fluctuating charts, you will not
solve the problem by hiding it, and this is what actually happens if you
don't have GC -- the traffic, the number of outdated keys, they all
fluctuate, but you just don't see it, if the chart always shows 100%
usage...


It has nothing to do with fear. It has to do with managing resources. A 
sudden peak in evictions is much better than a sudden lack of memory on 
all my memcached servers.  Evictions  OOM.


--

Brian.

http://brian.moonspot.net/


Re: Using PCIe SSDs instead of RAM

2010-07-22 Thread dormando
http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving

Think I'll write a separate page about managing memory, based off of the
slides from my mysqlconf presentation about monitoring memcached...

We're not ignoring you, the patch is against what the LRU is designed for.
Several people have argued to put garbage collection back into memcached,
but it just doesn't mix.

In the interest of being constructive, you should look back through the
mailing list for details on the storage engine branch, and if you really
want it to work, it'd be a good exercise to implement this as a custom
storage engine.

In the interest of being thorough; you proved your own patch unnecessary
by noting that the hitrate did not change. It just confirmed you weren't
having a problem.

The short notes of my slides are just:

- Note evictions over time
- Note hitrate over time
- Investigate changes to either via a traffic snapshot from maatkit,
either on your memcached server or from an app server. Or setup one app
server to log its memcached traffic. whatever you need to do.
- Note your DB load as well, and correlate *all* of these numbers.

You'll get way more useful information out of the *flow* through memcached
than from *what's inside it*. What's inside it doesn't matter, at all!

Keep your hitrate stable, investigate what your app is doing when it
changes. If there's nothing for you to fix and the hitrate is dropping, db
load is increasing, add more memcached servers. It's really really simple.
Honestly! Looking at just one stat and making that decision is pretty
weird.

In your case, you were seeing evictions despite 50% of your memory being
loaded with expired items. Neither of these things are a problem or even
matter, because:

- expired items are freed when they're fetched
- evicted items are picked off of the tail of the LRU

which means that *neither* the expired items or the evicted items are
being accessed at all. You have unexpired items which are being accessed
less frequently than stuff that's being expired!

It *could* indicate a problem, but simply garbage collecting will actually
*hide* it from you! You'll find it by analyzing your miss's and set's. You
might then see that your app is uselessly setting hundreds of keys every
time a user loads their profile, or frontpage, or whatever. Those keys
then expire without ever being used again.

That should lead you into a *real* benefit of not wasting time setting
extraneous keys, or fetching keys that never exist, or finding places to
combine data or issue multigets more correctly.

With respect to your multiget note, I went over this in quite a bit of
detail: http://dormando.livejournal.com/521163.html

If you're multiget'ing related data, there's zero reason for it to hit
more than one memcached instance. Except maybe you're fetching mass
numbers of huge keys and it makes more sense for the TCP sessions to be
split up in parallel. I dunno.

In one final note, I'd really really appreciate it if you could stop
hijacking threads to promote your patch. It's pretty rude, as your garbage
collector issue has been discussed on the list several times.

On Thu, 22 Jul 2010, Jakub Łopuszański wrote:

 Well, I beg to differ.
 We used to have evictions  0, actually around 200 (per whatever munin counts 
 them), so we used to think, that we have too small number of machines, and 
 kept adding them.
 After using the patch, the memory usage dropped by 80%, and we have no 
 evictions since a long time, which means, that evictions where misleading, 
 and happened just because LRU sometimes kills fresh items,
 even though there are lots of outdated keys.

 Moreover it's not like RAM usage fluctuates wildly. It's kind of constant, 
 or at least periodic, so you can very accurately say if something bad 
 happened, as it would be instantly visible as a deviation
 from yesterday's charts. Before applying the patch, you could as well not 
 look at the chart at all, as it was more than sure that it always shows 100% 
 usage, which in my opinion gives no clue about what is
 actually going on.

 Even if you are afraid of wildly fluctuating charts, you will not solve the 
 problem by hiding it, and this is what actually happens if you don't have GC 
 -- the traffic, the number of outdated keys, they
 all fluctuate, but you just don't see it, if the chart always shows 100% 
 usage...

 2010/7/22 Brian Moon br...@moonspot.net
   On 7/22/10 5:46 AM, Jakub Łopuszański wrote:
 I see that my patch for garbage collection is still being 
 ignored, and
 your post gives me some idea about why it is so.
 I think that RAM is a real problem, because currently (without 
 GC) you
 have no clue about how much RAM you really need. So you can end up
 blindly buying more and more machines, which effectively means 
 that
 multiget works worse and worse (client issues one big multiget 
 but it
 gets 

Re: Using PCIe SSDs instead of RAM

2010-07-14 Thread David Raccah
Can you also send me your patch?  We have been waiting for the storage
engine, but we are not close to maxing out our systems yet.

Thanks'
David

On 7/13/10, Mitch gmi...@gmail.com wrote:
 Hi Marten!

 I have developed a patch for memcached 1.4.x that splits memcached's
 slab store into metadata and data bits, so that the key/values can
 live on flash without a tremendous performance penalty.  Ultimately, I
 predict the best solution will be to use the storage engine branch and/
 or Northscale's membase, but for the time being the patch works pretty
 well.  I'll send you a private email with more info.

 thanks!
 Mitch (from Fusion-io)

 On Jul 9, 10:01 am, Marten Lehmann coolcoyo...@googlemail.com wrote:
 Hello,

 I know that memcached is designed to get its speed from the fast
 access to RAM. But RAM is still very expensive - even with the amount
 of RAM you get for the same money increasing every year.

 When I thought of using PCIe SSDs instead of RAM I wasn't doing this
 with regard to persistence of objects. I just noticed, that the Fusion-
 io's ioDrives are working with near-RAM speed, having the PCIe bus as
 the only bottleneck in speed (don't mix it up with SATA SSDs). An
 ioDrive 160 GB with SLC memory is available for less than $6,000 and
 is capable to perform more than 100,000 random IOPS (read and write),
 whereas with ECC RAM you'd have to pay a multiple of that amount the
 get the same ressources.

 I don't know of any way to use a block device (like the ioDrive) as
 RAM, you can only use RAM as a block device (which doesn't help in
 this situation). So for the emerging market of PCIe SSDs (many high
 performance databases are using this as replacement for RAID 10 arrays
 and large RAM) it would be necessary to extend or branch memcached to
 support SSD block devices.

 Did someone start with that, is this possibly already on the roadmap,
 or did the maintainers refuse to extend memcache with this option for
 a reason?

 Btw.: We are using memcached in conjunction with nginx as a web proxy
 to our backend webservers to cache images and other static files,
 which improves performance a lot. But 64 GB of RAM is much more
 expensiv than 160 GB of an ioDrive PCIe SSD.

 Kind regards
 Marten


Re: Using PCIe SSDs instead of RAM

2010-07-13 Thread Mitch
Hi Marten!

I have developed a patch for memcached 1.4.x that splits memcached's
slab store into metadata and data bits, so that the key/values can
live on flash without a tremendous performance penalty.  Ultimately, I
predict the best solution will be to use the storage engine branch and/
or Northscale's membase, but for the time being the patch works pretty
well.  I'll send you a private email with more info.

thanks!
Mitch (from Fusion-io)

On Jul 9, 10:01 am, Marten Lehmann coolcoyo...@googlemail.com wrote:
 Hello,

 I know that memcached is designed to get its speed from the fast
 access to RAM. But RAM is still very expensive - even with the amount
 of RAM you get for the same money increasing every year.

 When I thought of using PCIe SSDs instead of RAM I wasn't doing this
 with regard to persistence of objects. I just noticed, that the Fusion-
 io's ioDrives are working with near-RAM speed, having the PCIe bus as
 the only bottleneck in speed (don't mix it up with SATA SSDs). An
 ioDrive 160 GB with SLC memory is available for less than $6,000 and
 is capable to perform more than 100,000 random IOPS (read and write),
 whereas with ECC RAM you'd have to pay a multiple of that amount the
 get the same ressources.

 I don't know of any way to use a block device (like the ioDrive) as
 RAM, you can only use RAM as a block device (which doesn't help in
 this situation). So for the emerging market of PCIe SSDs (many high
 performance databases are using this as replacement for RAID 10 arrays
 and large RAM) it would be necessary to extend or branch memcached to
 support SSD block devices.

 Did someone start with that, is this possibly already on the roadmap,
 or did the maintainers refuse to extend memcache with this option for
 a reason?

 Btw.: We are using memcached in conjunction with nginx as a web proxy
 to our backend webservers to cache images and other static files,
 which improves performance a lot. But 64 GB of RAM is much more
 expensiv than 160 GB of an ioDrive PCIe SSD.

 Kind regards
 Marten


Using PCIe SSDs instead of RAM

2010-07-09 Thread Marten Lehmann
Hello,

I know that memcached is designed to get its speed from the fast
access to RAM. But RAM is still very expensive - even with the amount
of RAM you get for the same money increasing every year.

When I thought of using PCIe SSDs instead of RAM I wasn't doing this
with regard to persistence of objects. I just noticed, that the Fusion-
io's ioDrives are working with near-RAM speed, having the PCIe bus as
the only bottleneck in speed (don't mix it up with SATA SSDs). An
ioDrive 160 GB with SLC memory is available for less than $6,000 and
is capable to perform more than 100,000 random IOPS (read and write),
whereas with ECC RAM you'd have to pay a multiple of that amount the
get the same ressources.

I don't know of any way to use a block device (like the ioDrive) as
RAM, you can only use RAM as a block device (which doesn't help in
this situation). So for the emerging market of PCIe SSDs (many high
performance databases are using this as replacement for RAID 10 arrays
and large RAM) it would be necessary to extend or branch memcached to
support SSD block devices.

Did someone start with that, is this possibly already on the roadmap,
or did the maintainers refuse to extend memcache with this option for
a reason?

Btw.: We are using memcached in conjunction with nginx as a web proxy
to our backend webservers to cache images and other static files,
which improves performance a lot. But 64 GB of RAM is much more
expensiv than 160 GB of an ioDrive PCIe SSD.

Kind regards
Marten