I tried. Try the engine branch?
On Fri, 23 Jul 2010, Jakub Łopuszański wrote: > While I agree with most of your thesis, I can't see how GC is against the LRU. > > I agree, that often accessed keys with short TTL seem strange, and so do > rarely accessed keys with long TTL. But there are lots of perfect reasons to > have such situation, and we do. > GC does not work against the LRU (at least I can't see it), it cooperates. > Apparently LRU is never used, because you have smaller chances to run out of > memory, but I'd like to answer doubts of Brian Moon: > in case whole memory is occupied you will not get "sudden lack of memory", > but just the usuall thing: LRU will start to evict oldest items. > I agree that monitoring hitrates and evictions makes sens, but you can > forcast problems much sooner if you monitor number of unexpired items, as > well. > The point is: GC does not forbid you from using your regular monitoring > tools, skills and procedures. It just gives you another tool: live monitoring > of unexpired items. > I see nothing bad about it:) > > Scenario 1. You are releasing new feature, and you want to scale the number > of servers accordingly to the load. You can monitor memory usage as the users > join, extrapolate, and order new machines much > sooner, than by monitoring evictions, as evictions indicate that you already > have a problem. > Scenario 2. You need to steal machines from one cluster to help build another > one, and you have to decide if you can do so safely without risking that the > old cluster will "run of memory". Again monitoring > evictions can not reliably tell you how many machines can you remove from the > cluster, while monitoring memory gives you perfectly accurate info. > > > On Fri, Jul 23, 2010 at 12:12 AM, dormando <dorma...@rydia.net> wrote: > > http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving > > Think I'll write a separate page about managing memory, based off of the > slides from my mysqlconf presentation about monitoring memcached... > > We're not ignoring you, the patch is against what the LRU is designed > for. > Several people have argued to put garbage collection back into > memcached, > but it just doesn't mix. > > In the interest of being constructive, you should look back through the > mailing list for details on the storage engine branch, and if you really > want it to work, it'd be a good exercise to implement this as a custom > storage engine. > > In the interest of being thorough; you proved your own patch unnecessary > by noting that the hitrate did not change. It just confirmed you weren't > having a problem. > > The short notes of my slides are just: > > - Note evictions over time > - Note hitrate over time > - Investigate changes to either via a traffic snapshot from maatkit, > either on your memcached server or from an app server. Or setup one app > server to log its memcached traffic. whatever you need to do. > - Note your DB load as well, and correlate *all* of these numbers. > > You'll get way more useful information out of the *flow* through > memcached > than from *what's inside it*. What's inside it doesn't matter, at all! > > Keep your hitrate stable, investigate what your app is doing when it > changes. If there's nothing for you to fix and the hitrate is dropping, > db > load is increasing, add more memcached servers. It's really really > simple. > Honestly! Looking at just one stat and making that decision is pretty > weird. > > In your case, you were seeing evictions despite 50% of your memory being > loaded with expired items. Neither of these things are a problem or even > matter, because: > > - expired items are freed when they're fetched > - evicted items are picked off of the tail of the LRU > > which means that *neither* the expired items or the evicted items are > being accessed at all. You have unexpired items which are being accessed > less frequently than stuff that's being expired! > > It *could* indicate a problem, but simply garbage collecting will > actually > *hide* it from you! You'll find it by analyzing your miss's and set's. > You > might then see that your app is uselessly setting hundreds of keys every > time a user loads their profile, or frontpage, or whatever. Those keys > then expire without ever being used again. > > That should lead you into a *real* benefit of not wasting time setting > extraneous keys, or fetching keys that never exist, or finding places to > combine data or issue multigets more correctly. > > With respect to your multiget note, I went over this in quite a bit of > detail: http://dormando.livejournal.com/521163.html > > If you're multiget'ing related data, there's zero reason for it to hit > more than one memcached instance. Except maybe you're fetching mass > numbers of huge keys and it makes more sense for the TCP sessions to be > split up in parallel. I dunno. > > In one final note, I'd really really appreciate it if you could stop > hijacking threads to promote your patch. It's pretty rude, as your > garbage > collector issue has been discussed on the list several times. > > On Thu, 22 Jul 2010, Jakub Łopuszański wrote: > > > Well, I beg to differ. > > We used to have evictions > 0, actually around 200 (per whatever munin > > counts them), so we used to think, that we have too small number of > > machines, and kept adding them. > > After using the patch, the memory usage dropped by 80%, and we have no > > evictions since a long time, which means, that evictions where misleading, > > and happened just because LRU sometimes kills fresh > items, > > even though there are lots of outdated keys. > > > > Moreover it's not like RAM usage "fluctuates wildly". It's kind of > > constant, or at least periodic, so you can very accurately say if something > > bad happened, as it would be instantly visible as a > deviation > > from yesterday's charts. Before applying the patch, you could as well not > > look at the chart at all, as it was more than sure that it always shows > > 100% usage, which in my opinion gives no clue about > what is > > actually going on. > > > > Even if you are afraid of "wildly fluctuating" charts, you will not solve > > the problem by hiding it, and this is what actually happens if you don't > > have GC -- the traffic, the number of outdated > keys, they > > all fluctuate, but you just don't see it, if the chart always shows 100% > > usage... > > > > 2010/7/22 Brian Moon <br...@moonspot.net> > > On 7/22/10 5:46 AM, Jakub Łopuszański wrote: > > I see that my patch for garbage collection is still being > > ignored, and > > your post gives me some idea about why it is so. > > I think that RAM is a real problem, because currently (without > > GC) you > > have no clue about how much RAM you really need. So you can end > > up > > blindly buying more and more machines, which effectively means > > that > > multiget works worse and worse (client issues one big multiget > > but it > > gets split into many packets to many servers). > > Currently we try to get number of servers in the cluster > > smaller based > > on the reall consumption to get more from multiget feature. > > > > > > I would never, never, never want my memcached daemon ram usage to fluctuate > > wildly. Eviction rate is a much better determination of how well your cache > > is being used. > > > > -- > > > > Brian. > > -------- > > http://brian.moonspot.net/ > > > > > > > > > > > >