Re: Using PCIe SSDs instead of RAM

dormando Thu, 22 Jul 2010 23:47:13 -0700

I tried.

Try the engine branch?


On Fri, 23 Jul 2010, Jakub Łopuszański wrote:

> While I agree with most of your thesis, I can't see how GC is against the LRU.
>
> I agree, that often accessed keys with short TTL seem strange, and so do 
> rarely accessed keys with long TTL. But there are lots of perfect reasons to 
> have such situation, and we do.
> GC does not work against the LRU (at least I can't see it), it cooperates. 
> Apparently LRU is never used, because you have smaller chances to run out of 
> memory, but I'd like to answer doubts of Brian Moon:
> in case whole memory is occupied you will not get "sudden lack of memory", 
> but just the usuall thing: LRU will start to evict oldest items.
> I agree that monitoring hitrates and evictions makes sens, but you can 
> forcast problems much sooner if you monitor number of unexpired items, as 
> well.
> The point is: GC does not forbid you from using your regular monitoring 
> tools, skills and procedures. It just gives you another tool: live monitoring 
> of unexpired items.
> I see nothing bad about it:)
>
> Scenario 1. You are releasing new feature, and you want to scale the number 
> of servers accordingly to the load. You can monitor memory usage as the users 
> join, extrapolate, and order new machines much
> sooner, than by monitoring evictions, as evictions indicate that you already 
> have a problem.
> Scenario 2. You need to steal machines from one cluster to help build another 
> one, and you have to decide if you can do so safely without risking that the 
> old cluster will "run of memory". Again monitoring
> evictions can not reliably tell you how many machines can you remove from the 
> cluster, while monitoring memory gives you perfectly accurate info.
>
>
> On Fri, Jul 23, 2010 at 12:12 AM, dormando <dorma...@rydia.net> wrote:
>       
> http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving
>
>       Think I'll write a separate page about managing memory, based off of the
>       slides from my mysqlconf presentation about monitoring memcached...
>
>       We're not ignoring you, the patch is against what the LRU is designed 
> for.
>       Several people have argued to put garbage collection back into 
> memcached,
>       but it just doesn't mix.
>
>       In the interest of being constructive, you should look back through the
>       mailing list for details on the storage engine branch, and if you really
>       want it to work, it'd be a good exercise to implement this as a custom
>       storage engine.
>
>       In the interest of being thorough; you proved your own patch unnecessary
>       by noting that the hitrate did not change. It just confirmed you weren't
>       having a problem.
>
>       The short notes of my slides are just:
>
>       - Note evictions over time
>       - Note hitrate over time
>       - Investigate changes to either via a traffic snapshot from maatkit,
>       either on your memcached server or from an app server. Or setup one app
>       server to log its memcached traffic. whatever you need to do.
>       - Note your DB load as well, and correlate *all* of these numbers.
>
>       You'll get way more useful information out of the *flow* through 
> memcached
>       than from *what's inside it*. What's inside it doesn't matter, at all!
>
>       Keep your hitrate stable, investigate what your app is doing when it
>       changes. If there's nothing for you to fix and the hitrate is dropping, 
> db
>       load is increasing, add more memcached servers. It's really really 
> simple.
>       Honestly! Looking at just one stat and making that decision is pretty
>       weird.
>
>       In your case, you were seeing evictions despite 50% of your memory being
>       loaded with expired items. Neither of these things are a problem or even
>       matter, because:
>
>       - expired items are freed when they're fetched
>       - evicted items are picked off of the tail of the LRU
>
>       which means that *neither* the expired items or the evicted items are
>       being accessed at all. You have unexpired items which are being accessed
>       less frequently than stuff that's being expired!
>
>       It *could* indicate a problem, but simply garbage collecting will 
> actually
>       *hide* it from you! You'll find it by analyzing your miss's and set's. 
> You
>       might then see that your app is uselessly setting hundreds of keys every
>       time a user loads their profile, or frontpage, or whatever. Those keys
>       then expire without ever being used again.
>
>       That should lead you into a *real* benefit of not wasting time setting
>       extraneous keys, or fetching keys that never exist, or finding places to
>       combine data or issue multigets more correctly.
>
>       With respect to your multiget note, I went over this in quite a bit of
>       detail: http://dormando.livejournal.com/521163.html
>
>       If you're multiget'ing related data, there's zero reason for it to hit
>       more than one memcached instance. Except maybe you're fetching mass
>       numbers of huge keys and it makes more sense for the TCP sessions to be
>       split up in parallel. I dunno.
>
>       In one final note, I'd really really appreciate it if you could stop
>       hijacking threads to promote your patch. It's pretty rude, as your 
> garbage
>       collector issue has been discussed on the list several times.
>
> On Thu, 22 Jul 2010, Jakub Łopuszański wrote:
>
> > Well, I beg to differ.
> > We used to have evictions > 0, actually around 200 (per whatever munin 
> > counts them), so we used to think, that we have too small number of 
> > machines, and kept adding them.
> > After using the patch, the memory usage dropped by 80%, and we have no 
> > evictions since a long time, which means, that evictions where misleading, 
> > and happened just because LRU sometimes kills fresh
> items,
> > even though there are lots of outdated keys.
> >
> > Moreover it's not like RAM usage "fluctuates wildly". It's kind of 
> > constant, or at least periodic, so you can very accurately say if something 
> > bad happened, as it would be instantly visible as a
> deviation
> > from yesterday's charts. Before applying the patch, you could as well not 
> > look at the chart at all, as it was more than sure that it always shows 
> > 100% usage, which in my opinion gives no clue about
> what is
> > actually going on.
> >
> > Even if you are afraid of "wildly fluctuating" charts, you will not solve 
> > the problem by hiding it, and this is what actually happens if you don't 
> > have GC -- the traffic, the number of outdated
> keys, they
> > all fluctuate, but you just don't see it, if the chart always shows 100% 
> > usage...
> >
> > 2010/7/22 Brian Moon <br...@moonspot.net>
> >       On 7/22/10 5:46 AM, Jakub Łopuszański wrote:
> >             I see that my patch for garbage collection is still being 
> > ignored, and
> >             your post gives me some idea about why it is so.
> >             I think that RAM is a real problem, because currently (without 
> > GC) you
> >             have no clue about how much RAM you really need. So you can end 
> > up
> >             blindly buying more and more machines, which effectively means 
> > that
> >             multiget works worse and worse (client issues one big multiget 
> > but it
> >             gets split into many packets to many servers).
> >             Currently we try to get number of servers in the cluster 
> > smaller based
> >             on the reall consumption to get more from multiget feature.
> >
> >
> > I would never, never, never want my memcached daemon ram usage to fluctuate 
> > wildly. Eviction rate is a much better determination of how well your cache 
> > is being used.
> >
> > --
> >
> > Brian.
> > --------
> > http://brian.moonspot.net/
> >
> >
> >
> >
>
>
>
>

Re: Using PCIe SSDs instead of RAM

Reply via email to