Re: Using PCIe SSDs instead of RAM

dormando Thu, 22 Jul 2010 15:13:04 -0700

http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving

Think I'll write a separate page about managing memory, based off of the
slides from my mysqlconf presentation about monitoring memcached...

We're not ignoring you, the patch is against what the LRU is designed for.
Several people have argued to put garbage collection back into memcached,
but it just doesn't mix.

In the interest of being constructive, you should look back through the
mailing list for details on the storage engine branch, and if you really
want it to work, it'd be a good exercise to implement this as a custom
storage engine.

In the interest of being thorough; you proved your own patch unnecessary
by noting that the hitrate did not change. It just confirmed you weren't
having a problem.

The short notes of my slides are just:

- Note evictions over time
- Note hitrate over time
- Investigate changes to either via a traffic snapshot from maatkit,
either on your memcached server or from an app server. Or setup one app
server to log its memcached traffic. whatever you need to do.
- Note your DB load as well, and correlate *all* of these numbers.

You'll get way more useful information out of the *flow* through memcached
than from *what's inside it*. What's inside it doesn't matter, at all!

Keep your hitrate stable, investigate what your app is doing when it
changes. If there's nothing for you to fix and the hitrate is dropping, db
load is increasing, add more memcached servers. It's really really simple.
Honestly! Looking at just one stat and making that decision is pretty
weird.

In your case, you were seeing evictions despite 50% of your memory being
loaded with expired items. Neither of these things are a problem or even
matter, because:

- expired items are freed when they're fetched
- evicted items are picked off of the tail of the LRU

which means that *neither* the expired items or the evicted items are
being accessed at all. You have unexpired items which are being accessed
less frequently than stuff that's being expired!

It *could* indicate a problem, but simply garbage collecting will actually
*hide* it from you! You'll find it by analyzing your miss's and set's. You
might then see that your app is uselessly setting hundreds of keys every
time a user loads their profile, or frontpage, or whatever. Those keys
then expire without ever being used again.

That should lead you into a *real* benefit of not wasting time setting
extraneous keys, or fetching keys that never exist, or finding places to
combine data or issue multigets more correctly.

With respect to your multiget note, I went over this in quite a bit of
detail: http://dormando.livejournal.com/521163.html

If you're multiget'ing related data, there's zero reason for it to hit
more than one memcached instance. Except maybe you're fetching mass
numbers of huge keys and it makes more sense for the TCP sessions to be
split up in parallel. I dunno.

In one final note, I'd really really appreciate it if you could stop
hijacking threads to promote your patch. It's pretty rude, as your garbage
collector issue has been discussed on the list several times.

On Thu, 22 Jul 2010, Jakub Łopuszański wrote:

> Well, I beg to differ.
> We used to have evictions > 0, actually around 200 (per whatever munin counts 
> them), so we used to think, that we have too small number of machines, and 
> kept adding them.
> After using the patch, the memory usage dropped by 80%, and we have no 
> evictions since a long time, which means, that evictions where misleading, 
> and happened just because LRU sometimes kills fresh items,
> even though there are lots of outdated keys.
>
> Moreover it's not like RAM usage "fluctuates wildly". It's kind of constant, 
> or at least periodic, so you can very accurately say if something bad 
> happened, as it would be instantly visible as a deviation
> from yesterday's charts. Before applying the patch, you could as well not 
> look at the chart at all, as it was more than sure that it always shows 100% 
> usage, which in my opinion gives no clue about what is
> actually going on.
>
> Even if you are afraid of "wildly fluctuating" charts, you will not solve the 
> problem by hiding it, and this is what actually happens if you don't have GC 
> -- the traffic, the number of outdated keys, they
> all fluctuate, but you just don't see it, if the chart always shows 100% 
> usage...
>
> 2010/7/22 Brian Moon <br...@moonspot.net>
>       On 7/22/10 5:46 AM, Jakub Łopuszański wrote:
>             I see that my patch for garbage collection is still being 
> ignored, and
>             your post gives me some idea about why it is so.
>             I think that RAM is a real problem, because currently (without 
> GC) you
>             have no clue about how much RAM you really need. So you can end up
>             blindly buying more and more machines, which effectively means 
> that
>             multiget works worse and worse (client issues one big multiget 
> but it
>             gets split into many packets to many servers).
>             Currently we try to get number of servers in the cluster smaller 
> based
>             on the reall consumption to get more from multiget feature.
>
>
> I would never, never, never want my memcached daemon ram usage to fluctuate 
> wildly. Eviction rate is a much better determination of how well your cache 
> is being used.
>
> --
>
> Brian.
> --------
> http://brian.moonspot.net/
>
>
>
>

Re: Using PCIe SSDs instead of RAM

Reply via email to