Re: Idea for reclaimation algo

dormando Sun, 13 Apr 2014 17:07:17 -0700

> On Sun, 13 Apr 2014, Slawomir Pryczek wrote:
>
> > So high evictions when cleaning algo isn't enabled could be caused by slab 
> > imbalance due to high-memory slabs eating most of ram... and i just
> > incorrectly assumed low TTL items are expired before high TTL items, 
> > because in such cases the cache didn't have enough memory to store all low 
> > TTL
> > items, and both - low and high TTL's were evicted, interesting...
>
> yes.
>
> > So you're saying if i set some item X, to evict it - i'd need to write AT 
> > LEAST as many new items as as X's slab contains, because item will be
> > added on head, and you're removing from tail, right?
>
> yes. It's actually worse than that, since deleting items or fetching
> expired ones will make extra room, slowing it down.


Actually, even worse than that still: During an allocation the
*bottommost* item in the LRU is always checked for expiration before more
memory is assigned. (this is the 'reclaimed' stat). So if you have a cache
with only items of a TTL 60s, you will stop assigning memory if you set
into the cache slower than they expire.

> > Sending some slab stats, and TTL left for slab 1 where there are no 
> > evictions + slab 2 where there is plenty. Unfortunately i can't send dumps 
> > as
> > these contain some sensitive data.
> > http://img.liczniki.org/20140414/slabs_all_918-1397431358.png
>
> Can you *please* sent a text dump of stats items and stats slabs? Just
> grep out or censor what you don't want to share? Doing math against a
> picture is a huge annoyance. It's also missing important counters I'd like
> to look at.
>
> > For ratio of long/short hard to tell... but most are definitely short.
> >
> > >Slab class 3 has 1875968 total chunks in your example, which means in 
> > >order to cause a 120s item to evict early you need to insert into *that 
> > >slab class* at a rate of 15,000 items per second, unless it's a multi-hour 
> > >item instead. In which case what you said is happening but reversed: lots 
> > >of junk 120s items are causing 5hr items to evict, but after many minutes 
> > >and definitely not mere seconds. 
> >
> > Yes but as you can see this class only contains 15% of valid data and have 
> > plenty of evictions. The next class contains 7% valid data, but still
> > have 4 evictions. Probably would be best just to keep TTLs same for all 
> > data...
>
> Your main complaint has been that 120s values don't persist for more than
> 60s, there's a 0% chance of any items in slab class 3 having a TTL of
> 120s.
>
> If you kept the TTL's all the same, what would they be? If they were all
> 120s and you rebalanced slabs, you'd probably never have a problem (but
> it seemed like you needed some data for longer).
>
> >
> >
> >
> >
> >
> >
> >
> >
> > W dniu niedziela, 13 kwietnia 2014 21:12:43 UTC+2 użytkownik Dormando 
> > napisał:
> >       > Hey Dormando...
> >       >
> >       > Some quick question first... i have checked some Intel papers on 
> > their memcached fork and for 1.6 it seems that there's some rather
> >       big lock
> >       > contention... have you thought about just "gluing" individual items 
> > to a thread, using maybe item hash or some configurable method...
> >       this way 2
> >       > threads won't be able to access same item at one time. I'm 
> > wondering what whould be problems with such approach because it seems
> >       rational at first
> >       > glance, instead of locking the whole cache... im just curious. Is 
> > there some release plan for 1.6... i think 2-3 years ago it was in
> >       > developement... you're getting closer to releasing it?
> >
> >       1.4 tree has much less lock contention than 1.6. I made repeated 
> > calls for
> >       people to help pull bugs out of 1.6 and was ignored, so I ended up
> >       continuing development against 1.4... There's no whole-cache lock in 
> > 1.4,
> >       there's a whole-LRU lock but the operations are a lot shorter. It's 
> > much
> >       much faster than the older code.
> >
> >       > Im not ignoring your posts, actually i read them but didn't want 
> > that my posts were too large. Actually we tried using several other
> >       things beside
> >       > memcached. After switching from mysql with some memory tables to 
> > memcached ~2 years ago measured throughput went from 40-50 req/s to
> >       about 4000
> >       > r/s. Back then it was fine, then when traffic went higher the cache 
> > was no longer able to evict items almost at all.
> >       >
> >       > Changing infrastructure in project that was in developement for 
> > over 2 years is not easy thing. We also tested some other things like
> >       mongodb,
> >       > redis back then... and we just CAN'T have this data to be hitting 
> > disks. Maybe now there are more options, but we are already
> >       considering golang or
> >       > C rewrite for this part... we don't want to switch to some other 
> > "shared memory"-ish system, just be able to access data directly
> >       between calls and
> >       > do locking ourselves.
> >       >
> >       > So, again, as for current solution, decisions about what tools we 
> > use were made a very long time ago, and are not easy to change now.
> >       >
> >       > >Almost instant evictions; so an item is stored, into a 16GB 
> > instance, and < 120 seconds later is bumped out of the LRU? 
> >       >
> >       > Yes, the items we insert for TTL=120 to 500s are not able to hold 
> > in cache even for 60s, when we start to read/aggregate them, and
> >       are thrown away
> >       > instead of garbage. I understand why is that...
> >       > S - short TTL
> >       > L - long TTL
> >       > SSSSSSSLSSS -> when we insert L item, no S items before L will be 
> > able to get reclaimed, with current algo, ever. Because new L
> >       items, will appear
> >       > later too... for LRU to work optimially under high loads, every 
> > item should have nearly the same TTL in given slab. You could try to
> >       reclaim from
> >       > top and bottom but this way one "hold forever" item would break the 
> > whole thing as soon as it't get to top.
> >
> >       I understand this part, I just find it suspicious.
> >
> >       > Longest items in pool are set for 5-6h. And unfortunately item size 
> > is no way correlated to TTL. We eg. store UA analyze and geo data
> >       for 5h. These
> >       > items are very short, as short as eg. impression counters.
> >
> >       Ok. Any idea what the ratio to long to short is? like 10% 120s, 90% 
> > 5h, or
> >       reverse or whatever?
> >
> >       > >You'll probably just ignore me again, but isn't this just slab 
> > imbalance? 
> >       > No it isn't... how in hell can slab imbalance happen over just 1h, 
> > without code changes ;)
> >
> >       I can make slab imbalance happen in under 10 seconds. Not really the
> >       point: Slab pages are pulled from the global pool as-needed as memory
> >       fills. If your traffic has ebbs and flows, or tends to set a lot more
> >       items in one class than others it will immediately fill and others 
> > will
> >       starve.
> >
> >       > >A better way to do this is to bucket the memory by TTL. You have 
> > lots of 
> >       > >pretty decent options for this (and someone else already suggested 
> > one)
> >       > Sure, if we knew back then we'd just create 3-4 memcached 
> > instances, add some API and shard the items based on requested TTL.
> >
> >       You can't do that now? That doesn't really seem that hard and doesn't
> >       change the fundamental infrastructure... It'd be a lot easier than
> >       maintaining your own fork of memcached forever, I'd think.
> >
> >       > >The slab rebalance system lets you plug in your own algorithm by 
> > running >the page reassignment commands manually. Then you can
> >       smooth out the
> >       > pages 
> >       > >to where you think they should be. 
> >       > Sure, but that's actually not my problem... the problem is that im 
> > having full of "expired" items, so this would require some hacking
> >       of that slab
> >       > rebalance algo (am i right?)... and it seems a little complicated 
> > to me, to be done in 3-4 days time.
> >
> >       Bleh.
> >
> >       > >A better way to do this is to bucket the memory by TTL. You have 
> > lots of 
> >       > >pretty decent options for this (and someone else already suggested 
> > one)Haha, sure it's better :) We'd obviously have done that if we
> >       knew 2 years
> >       > ago what we know now :)
> >       >
> >       > I actually wrote a quick code to "redirect" about 20% of traffic 
> > we're sending/receiving to/from memcached to my "hacked" version...
> >       for all times
> >       > on screens you neet to do minus 5 minutes (we run memcached, then 
> > enabled the code 5 minutes after it was started)...
> >       >
> >       > How situation develops:
> >       > http://screencast.com/t/YZCR2uc1bd
> >       > http://screencast.com/t/pvCmbo13z6T
> >       > http://screencast.com/t/MLwgvKzaeE
> >       >
> >       > Still no evictions:
> >       > http://screencast.com/t/BJuz8l5c
> >       >
> >       > Full of 100% utulized slab classes:
> >       > http://screencast.com/t/vTlFahoR
> >       >
> >       > After cleaning using hack some of these classes have only 10% of 
> > non-expired items.
> >       > http://screencast.com/t/z39z7U1TNW2
> >       >
> >       > With current items TTL we insert into cache it can eat any amount 
> > of memory. I think the problem was diagnosed correctly as just
> >       problem with
> >       > reclaiming algo (or our usage patters, looking from other side)...
> >
> >       That's a 2G instance, and you can clearly see that two of those slab
> >       classes only have 14 and 20 megabytes of data assigned to them. 
> > Another
> >       one has 206 megabytes but only 4 evictions. This thing is also hiding 
> > a
> >       lot of information... can we see the actual dumps?
> >
> >       What's evicted_nonfetched? What's evicted_time? What's the full 
> > textual
> >       output of "stats slabs" and "stats items" ?
> >
> >       The thing that bothers me, is that in order for a 120 second item to 
> > be
> >       evicted before it expires, you need to insert more data ahead of it 
> > than
> >       will fit in the entire slab class within that period of time:
> >
> >       Slab class 3 has 1875968 total chunks in your example, which means in
> >       order to cause a 120s item to evict early you need to insert into 
> > *that
> >       slab class* at a rate of 15,000 items per second, unless it's a 
> > multi-hour
> >       item instead. In which case what you said is happening but reversed: 
> > lots
> >       of junk 120s items are causing 5hr items to evict, but after many 
> > minutes
> >       and definitely not mere seconds.
> >
> >       Your other screenshots indicate that your *global* set rate is closer 
> > to
> >       3,500. So even in this 2GB instance, and even with 5-6 hour items 
> > mixed
> >       in, 120s items should never be blown out of slab class 3. 5-6 hour 
> > items
> >       might be, but in that case we can argue that the instance is simply 
> > way
> >       too small. Your screenshot also only accounts for 400 out of the 2000
> >       pages that should be in there. What's tying up the other 1600 pages?
> >
> >       The classes at the bottom (11, 12) are much smaller and have far fewer
> >       pages: 48286 chunks in 11, which only requires a set rate of 402/s to
> >       blow the top off of for a 120s item.
> >
> >       So yes, I absolutely think it's off balanced or sized wrong. I also
> >       absolutely agree that doing some sort of internal cleaning would make 
> > the
> >       memory usage more efficient and reduce your evictions: I believe you 
> > here.
> >       However it's difficult to implement without causing extra latency, 
> > again,
> >       as you've seen. It's also going to work too aggressively since you're
> >       trying store too many of them within a 40 megabyte subset of this 2000
> >       megabyte cachie instance!
> >
> >       I'm also a bit worried that you may have been hitting some item
> >       immortality issues: the current master branch has fixed a few of those
> >       (and I suck for not getting it released yet). Refcount leaks and 
> > things.
> >       Perhaps your patch was ungluing the bugs accidentally by killing
> >       immortalized items.
> >
> >       However, a properly sized instance with more carefully balanced slabs
> >       would probably operate a lot better than what you're doing now. It may
> >       also make your existing patch more efficient: as making it far less 
> > likely
> >       for a page-starved slab class to have to scan many items before 
> > finding
> >       expired ones. :)
> >
> >       I've wanted to implement an LRU "skimmer" but it's not exactly easy 
> > and
> >       still relies on luck. It's what FB ended up doing but they extended 
> > it in
> >       some ways: I think it's literally arbitrary code to match items as it
> >       crawls the LRU's from a background thread.
> >
> >       ultimately a combination of page rebalance + an LRU skimmer would be 
> > most
> >       optimal. Having enough memory available and properly balanced pages 
> > means
> >       your "relcaimer/garbage collector" has to work less hard. This is a
> >       universal truth in all garbage collected languages: if you don't want 
> > it
> >       to impact latency, do things to make it work less. MC's default goes 
> > to
> >       the extreme of never garbage collecting, since the use case is that 
> > items
> >       which don't ever get fetched have no value.
> >
> >       >
> >       >
> >       >
> >       >
> >       >
> >       >
> >       >
> >       > W dniu piątek, 11 kwietnia 2014 20:06:23 UTC+2 użytkownik Dormando 
> > napisał:
> >       >
> >       >
> >       >       On Fri, 11 Apr 2014, Slawomir Pryczek wrote:
> >       >
> >       >       > Hi Dormando, more about the behaviour... when we're using 
> > "normal" memcached 1.4.13 16GB of memory gets exhausted in ~1h,
> >       then we
> >       >       start to have
> >       >       > almost instant evictions of needed items (again these items 
> > aren't really "needed" individually, just when many of them gets
> >       evicted
> >       >       it's
> >       >       > unacceptable because of how badly it affects the system)
> >       >
> >       >       Almost instant evictions; so an item is stored, into a 16GB 
> > instance, and
> >       >       < 120 seconds later is bumped out of the LRU?
> >       >
> >       >       You'll probably just ignore me again, but isn't this just 
> > slab imbalance?
> >       >       Once your instance fills up there're probably a few slab 
> > classes with way
> >       >       too little memory in them.
> >       >
> >       >       'stats slabs' shows you per-slab eviction rates, along with 
> > the last
> >       >       accessed time of an item when it was evicted. What does this 
> > look like on
> >       >       one of your full instances?
> >       >
> >       >       The slab rebalance system lets you plug in your own algorithm 
> > by running
> >       >       the page reassignment commands manually. Then you can smooth 
> > out the pages
> >       >       to where you think they should be.
> >       >
> >       >       You mention long and short TTL, but what are they exactly? 
> > 120s and an
> >       >       hour? A week?
> >       >
> >       >       I understand your desire to hack up something to solve this, 
> > but as you've
> >       >       already seen scanning memory to remove expired items is 
> > problematic:
> >       >       you're either going to do long walks from the tail, use a 
> > background
> >       >       thread and walk a "probe" item through, or walk through 
> > random slab pages
> >       >       looking for expired memory. None of these are very efficient 
> > and tend to
> >       >       rely on luck.
> >       >
> >       >       A better way to do this is to bucket the memory by TTL. You 
> > have lots of
> >       >       pretty decent options for this (and someone else already 
> > suggested one):
> >       >
> >       >       - In your client, use different memcached pools for major TTL 
> > buckets (ie;
> >       >       one instance only gets long items, one only short). Make sure 
> > the slabs
> >       >       aren't imbalanced via the slab rebalancer.
> >       >
> >       >       - Are the sizes of the items correlated with their TTL? Are 
> > 120s items
> >       >       always in a ~300 byte range and longer items tend to be in a 
> > different
> >       >       byte range? You could use length pagging to shunt them into 
> > specific slab
> >       >       classes, separating them internally at the cost of some ram 
> > efficiency.
> >       >
> >       >       - A storage engine (god I wish we'd made 1.6 work...) which 
> > allows
> >       >       bucketing by TTL ranges. You'd want a smaller set of slab 
> > classes to not
> >       >       waste too much memory here, but the idea is the same as 
> > running multiple
> >       >       individual instances, except internally splitting the storage 
> > engine
> >       >       instead and storing everything in the same hash table.
> >       >
> >       >       Those three options completely avoid latency problems, the 
> > first one
> >       >       requires no code modifications and will work very well. The 
> > third is the
> >       >       most work (and will be tricky due to things like slab 
> > rebalance, and none
> >       >       of the slab class identification code will work). I would 
> > avoid it unless
> >       >       I were really bored and wanted to maintain my own fork 
> > forever.
> >       >
> >       >       > ~2 years ago i created another version based on that 
> > 1.4.13, than does garbage collection using custom stats handler. That
> >       version is
> >       >       able to be
> >       >       > running on half of the memory for like 2 weeks, with 0 
> > evictions. But we gave it full 16G and just restart it each week to be
> >       sure
> >       >       memory usage is
> >       >       > kept in check, and we're not throwing away good data. 
> > Actually after changing -f1.25 to -f1.041 the slabs are filling with
> >       bad items
> >       >       much slower,
> >       >       > because items are distributed better and this custom 
> > eviction function is able to catch more expired data. We have like 200GB
> >       of data
> >       >       evicted this
> >       >       > way, daily. Because of volume (~40k req/s peak, much of it 
> > are writes) and differences in expire time LRU isn't able to
> >       reclaim items
> >       >       efficiently.
> >       >       >
> >       >       > Maybe people don't even realize the problem, but when we 
> > done some testing and turned off that "custom" eviction we had like
> >       100%
> >       >       memory used with
> >       >       > 10% of waste reported by memcached admin. But then we run 
> > that custom eviction algorithm it turned out that 90% of memory is
> >       occupied
> >       >       by garbage.
> >       >       > Waste reported grew to 80% instantly after running 
> > unlimited "reclaim expired" on all items in the cache. So in "standard"
> >       client
> >       >       when people will
> >       >       > be using different expire times for items (we have it like 
> > 1minute minimum, 6h max)... they even won't be able to see how
> >       much memory
> >       >       they're
> >       >       > wasting in some specific cases, when they'll have many 
> > items that won't be hit after expiration, like we have.
> >       >       >
> >       >       > When using memcached as a buffer for mysql writes, we know 
> > exactly what to hit and when. Short TTL expired items, pile up
> >       near the
> >       >       head... long TTL
> >       >       > "live" items pile up near the tail and it's creating a 
> > barrier that prevents the LRU algo to reclaim almost anything, if im
> >       getting
> >       >       how it
> >       >       > currently works, correctly...
> >       >       >
> >       >       > >You made it sound like you had some data which never 
> > expired? Is this true? 
> >       >       > Yes, i think because of how evictions are made (to be clear 
> > we're not setting non-expiring items). These short expiring items
> >       pile up
> >       >       in the front
> >       >       > of linked list, something that is supposed to live for eg. 
> > 120 or 180 seconds is lingering in memory forever, untill we
> >       restart the
> >       >       cache... and
> >       >       > new items are killed almost instantly because there are no 
> > expired items on head.
> >       >       >
> >       >       > It's a special case, because after processing memory list, 
> > aggregating data and putting it in mysql these items are no longer
> >       >       touched. The list for
> >       >       > new time period will have completely different set of keys. 
> > As we use a prefix to generate all items in the list.
> >       >       >
> >       >       > $time_slice = floor( self::$time / 60) - $time_slices_back;
> >       >       > $prefix = ")ML){$list_id}-{$time_slice}";
> >       >       >
> >       >       > Again, not saying current implementation is bad... because 
> > it's fast and doesn't trash CPU cache when expire times are
> >       ~equal, that
> >       >       was probably
> >       >       > the idea... but we have not typical use case, which LRU 
> > isn't able to manage...
> >       >       >
> >       >       > Now im making ~same changes i made for .13... but for .17 
> > and i want to make it working a little better ;)
> >       >       >
> >       >       >
> >       >       >
> >       >       > W dniu piątek, 11 kwietnia 2014 05:12:10 UTC+2 użytkownik 
> > Dormando napisał:
> >       >       >
> >       >       >       > Hey Dormando, thanks again for some comments... 
> > appreciate the help.
> >       >       >       >
> >       >       >       > Maybe i wasn't clear enough. I need only 1 minute 
> > persistence, and i can lose data sometimes, just i can't keep
> >       loosing data
> >       >       every
> >       >       >       minute due to
> >       >       >       > constant evictions caused by LRU. Actually i have 
> > just wrote that in my previous post. We're loosing about 1 minute
> >       of
> >       >       >       non-meaningfull data every
> >       >       >       > week because of restart that we do when memory 
> > starts to fill up (even with our patch reclaiming using linked list,
> >       we limit
> >       >       >       reclaiming to keep
> >       >       >       > speed better)... so the memory fills up after a 
> > week, not 30 minutes...
> >       >       >
> >       >       >       Can you explain what you're seeing in more detail? 
> > Your data only needs to
> >       >       >       persist for 1 minute, but it's being evicted before 1 
> > minute is up?
> >       >       >
> >       >       >       You made it sound like you had some data which never 
> > expired? Is this
> >       >       >       true?
> >       >       >
> >       >       >       If your instance is 16GB, takes a week to fill up, 
> > but data only needs to
> >       >       >       persist for a minute but isn't, something else is 
> > very broken? Or am I
> >       >       >       still misunderstanding you?
> >       >       >
> >       >       >       > Now im creating better solution, to limit locking 
> > as linked list is getting bigger.
> >       >       >       >
> >       >       >       > I explained what was worst implications of unwanted 
> > evictions (or loosing all data in cache) in my use case:
> >       >       >       > 1. loosing ~1 minute of non-significant data that's 
> > about to be stored in sql
> >       >       >       > 2. "flat" distribution of load to workers (not 
> > taking response times into account because stats reset).
> >       >       >       > 3. resorting to alternative targeting algorithm 
> > (with global, not local statistics).
> >       >       >       >
> >       >       >       > I never, ever said im going to write data that have 
> > to be persistent permanently. It's actually same idea as delayed
> >       write.
> >       >       If power
> >       >       >       fails you
> >       >       >       > loose 5s of data, but you can do 100x more writes. 
> > So you need the data to be persistent in memory, between writes
> >       the data
> >       >       **can't
> >       >       >       be lost**.
> >       >       >       > However you can lose it sometimes, that's the 
> > tradeoff that some people can make and some not. Obviously I can't keep
> >       loosing
> >       >       this
> >       >       >       data each
> >       >       >       > minute, because if i loose much it'll become 
> > meaningfull.
> >       >       >       >
> >       >       >       > Maybe i wasn't clear in that matter. I can loose 
> > all data even 20 times a day. Sensitive data is stored using bulk
> >       update or
> >       >       >       transactions,
> >       >       >       > bypassing that "delayed write" layer. "0 
> > evictions", that's the kind of "persistence" im going for. So items are
> >       persistent
> >       >       for some
> >       >       >       very short
> >       >       >       > periods of time (1-5 minutes) without being killed. 
> > It's just different use case. Running in production since 2
> >       years, based
> >       >       on
> >       >       >       1.4.13, tested for
> >       >       >       > corectness, monitored so we have enough memory and 
> > 0 evictions (just reclaims)
> >       >       >       >
> >       >       >       > When i came here with same idea ~2 years ago you 
> > just said it's very stupid, now you even made me look like a moron
> >       :) And i
> >       >       can
> >       >       >       understand why you
> >       >       >       > don't want features that are not ~O(1) perfectly, 
> > but please don't get so personal about different ideas to do things
> >       and use
> >       >       cases,
> >       >       >       just because
> >       >       >       > these won't work for you.
> >       >       >       >
> >       >       >       >
> >       >       >       >
> >       >       >       >
> >       >       >       >
> >       >       >       > W dniu czwartek, 10 kwietnia 2014 20:53:12 UTC+2 
> > użytkownik Dormando napisał:
> >       >       >       >       You really really really really really *must* 
> > not put data in memcached
> >       >       >       >       which you can't lose.
> >       >       >       >
> >       >       >       >       Seriously, really don't do it. If you need 
> > persistence, try using a redis
> >       >       >       >       instance for the persistent stuff, and use 
> > memcached for your cache stuff.
> >       >       >       >       I don't see why you feel like you need to 
> > write your own thing, there're a
> >       >       >       >       lot of persistent key/value stores 
> > (kyotocabinet/etc?). They have a much
> >       >       >       >       lower request ceiling and don't handle the 
> > LRU/cache pattern as well, but
> >       >       >       >       that's why you can use both.
> >       >       >       >
> >       >       >       >       Again, please please don't do it. You are 
> > damaging your company. You are a
> >       >       >       >       *danger* to your company.
> >       >       >       >
> >       >       >       >       On Thu, 10 Apr 2014, Slawomir Pryczek wrote:
> >       >       >       >
> >       >       >       >       > Hi Dormando, thanks for suggestions, 
> > background thread would be nice...
> >       >       >       >       > The idea is actually that with 2-3GB i get 
> > plenty of evictions of items that need to be fetched later. And
> >       with 16GB
> >       >       i still
> >       >       >       get
> >       >       >       >       evictions,
> >       >       >       >       > actually probably i could throw more memory 
> > than 16G and it'd only result in more expired items sitting in
> >       the middle
> >       >       of
> >       >       >       slabs,
> >       >       >       >       forever... Now im
> >       >       >       >       > going for persistence. Sounds probably 
> > crazy, but we're having some data that we can't loose:
> >       >       >       >       > 1. statistics, we aggregate writes to DB 
> > using memcached (+list implementation). If these items get evicted
> >       we're
> >       >       loosing
> >       >       >       rows in db.
> >       >       >       >       Loosing data
> >       >       >       >       > sometimes isn't a big problem. Eg. we 
> > restart memcached once a week so we're loosing 1 minute of data every
> >       week. But
> >       >       if we
> >       >       >       have
> >       >       >       >       evictions we're
> >       >       >       >       > loosing data constantly (which we can't 
> > have)
> >       >       >       >       > 2. we drive load balancer using data in 
> > memcached for statistics, again, not nice to loose data often because
> >       workers
> >       >       can get
> >       >       >       >       incorrect amount of
> >       >       >       >       > traffic.
> >       >       >       >       > 3. we're doing some adserving 
> > optimizations, eg. counting per-domain ad priority, for one domain it takes
> >       about 10
> >       >       seconds to
> >       >       >       analyze
> >       >       >       >       all data and
> >       >       >       >       > create list of ads, so can't be done 
> > online... we put result of this in memcached, if we loose too much of
> >       this the
> >       >       system
> >       >       >       will start
> >       >       >       >       to serve
> >       >       >       >       > suboptimal ads (because it'll need to 
> > switch to more general data or much simpler algorithm that can be done
> >       >       instantly)
> >       >       >       >       >
> >       >       >       >       > Probably would be best to rewrite all this 
> > using C or golang, and use memcached just for caching, but it'd
> >       take too
> >       >       much time
> >       >       >       which
> >       >       >       >       we don't have
> >       >       >       >       > currently...
> >       >       >       >       >
> >       >       >       >       > I have seen twitter and nk implementations 
> > that seem to do what i need, but they seem old (based on old
> >       code), so I
> >       >       prefer to
> >       >       >       modify
> >       >       >       >       code of recent
> >       >       >       >       > "official" memcached, to not be stuck with 
> > old code or abandonware. Actually there are many topics about
> >       limitations
> >       >       of
> >       >       >       currrent
> >       >       >       >       eviction algo and
> >       >       >       >       > option to enable some background thread to 
> > do scraping based on statistics of most filled slabs (with some
> >       parameter
> >       >       to
> >       >       >       specify if it
> >       >       >       >       should take
> >       >       >       >       > light or aggressive approach) would be 
> > nice...
> >       >       >       >       >
> >       >       >       >       > As for the code... is that 
> > slab_rebalance_move function in slab.c? It seems a little difficult to gasp
> >       without some
> >       >       DOCs of
> >       >       >       how
> >       >       >       >       things are
> >       >       >       >       > working... can you please write a very 
> > short description of how this "angry birds" more workd?
> >       >       >       >
> >       >       >       >       Look at doc/protocol.txt for explanations of 
> > the slab move options. the
> >       >       >       >       names are greppable back to the source.
> >       >       >       >
> >       >       >       >       > I have quick question about this above... 
> > linked is item that's placed on linked list, but what other flags
> >       means,
> >       >       and why 2
> >       >       >       last are
> >       >       >       >       2 of them
> >       >       >       >       > temporary?
> >       >       >       >       > #define ITEM_LINKED 1
> >       >       >       >       > #define ITEM_CAS 2
> >       >       >       >       >
> >       >       >       >       > /* temp */
> >       >       >       >       > #define ITEM_SLABBED 4
> >       >       >       >       > #define ITEM_FETCHED 8
> >       >       >       >       >
> >       >       >       >       > This from slab_rebalance_move seems 
> > interesting:
> >       >       >       >       > refcount = refcount_incr(&it->refcount);
> >       >       >       >       > ...
> >       >       >       >       > if (refcount == 1) { /* item is unlinked, 
> > unused */
> >       >       >       >       > ...
> >       >       >       >       > } else if (refcount == 2) { /* item is 
> > linked but not busy */
> >       >       >       >       >
> >       >       >       >       > Is there some docs about refcounts, locks 
> > and item states? Basically why item with refcount 2 is not busy?
> >       You're
> >       >       increasing
> >       >       >       refcount
> >       >       >       >       by 1 on
> >       >       >       >       > select, then again when reading data? Can 
> > refcount ever be higher than 2 (3 in above case), meaning 2 threads
> >       can
> >       >       access same
> >       >       >       item?
> >       >       >       >
> >       >       >       >       The comment on the same line is explaining 
> > exactly what it means.
> >       >       >       >
> >       >       >       >       Unfortunately it's a bit of a crap shoot. I 
> > think I wrote a threads
> >       >       >       >       explanation somewhnere (some release notes, 
> > or in a file in there, I can't
> >       >       >       >       quite remember offhand). Since scaling the 
> > thread code it got a lot more
> >       >       >       >       complicated. You have to be extremely careful 
> > under what circumstances you
> >       >       >       >       access items (you must hold an item lock + 
> > the refcount must be 2 if you
> >       >       >       >       want to unlink it).
> >       >       >       >
> >       >       >       >       You'll just have to study it a bit, sorry. 
> > Grep around to see where the
> >       >       >       >       flags are used.
> >       >       >       >
> >       >       >       >       > Thanks.
> >       >       >       >       >
> >       >       >       >       > W dniu czwartek, 10 kwietnia 2014 06:05:30 
> > UTC+2 użytkownik Dormando napisał:
> >       >       >       >       >       > Hi Guys,
> >       >       >       >       >       > im running a specific case where i 
> > don't want (actually can't have) to have evicted items (evictions
> >       = 0
> >       >       ideally)...
> >       >       >       now i
> >       >       >       >       have
> >       >       >       >       >       created some simple
> >       >       >       >       >       > algo that lock the cache, goes 
> > through linked list and evicts items... it makes some problems, like
> >       10-20ms
> >       >       cache
> >       >       >       locks on
> >       >       >       >       some
> >       >       >       >       >       cases.
> >       >       >       >       >       >
> >       >       >       >       >       > Now im thinking about going through 
> > each slab memory (slabs keep a list of allocated memory regions)
> >       ...
> >       >       looking for
> >       >       >       items,
> >       >       >       >       if
> >       >       >       >       >       expired item is
> >       >       >       >       >       > found, evict it... this way i can 
> > go eg. 10k items or 1MB of memory at a time + pick slabs with high
> >       >       utilization and
> >       >       >       run this
> >       >       >       >       >       "additional" eviction
> >       >       >       >       >       > only on them... so it'll prevent 
> > allocating memory just because unneded data with short TTL is
> >       occupying HEAD
> >       >       of the
> >       >       >       list.
> >       >       >       >       >       >
> >       >       >       >       >       > With this linked list eviction im 
> > able to run on 2-3GB of memory... without it 16GB of memory is
> >       exhausted in
> >       >       1-2h
> >       >       >       and then
> >       >       >       >       memcached
> >       >       >       >       >       starts to
> >       >       >       >       >       > kill "good" items (leaving expired 
> > ones wasting memory)...
> >       >       >       >       >       >
> >       >       >       >       >       > Any comments?
> >       >       >       >       >       > Thanks.
> >       >       >       >       >
> >       >       >       >       >       you're going a bit against the base 
> > algorithm. if stuff is falling out of
> >       >       >       >       >       16GB of memory without ever being 
> > utilized again, why is that critical?
> >       >       >       >       >       Sounds like you're optimizing the 
> > numbers instead of actually tuning
> >       >       >       >       >       anything useful.
> >       >       >       >       >
> >       >       >       >       >       That said, you can probably just 
> > extend the slab rebalance code. There's a
> >       >       >       >       >       hook in there (which I called "Angry 
> > birds mode") that drives a slab
> >       >       >       >       >       rebalance when it'd otherwise run an 
> > eviction. That code already safely
> >       >       >       >       >       walks the slab page for unlocked 
> > memory and frees it; you could edit it
> >       >       >       >       >       slightly to check for expiration and 
> > then freelist it into the slab class
> >       >       >       >       >       instead.
> >       >       >       >       >
> >       >       >       >       >       Since it's already a background 
> > thread you could further modify it to just
> >       >       >       >       >       wake up and walk pages for stuff to 
> > evict.
> >       >       >       >       >
> >       >       >       >       > --
> >       >       >       >       >
> >       >       >       >       > ---
> >       >       >       >       > You received this message because you are 
> > subscribed to the Google Groups "memcached" group.
> >       >       >       >       > To unsubscribe from this group and stop 
> > receiving emails from it, send an email to
> >       memcached+...@googlegroups.com.
> >       >       >       >       > For more options, visit 
> > https://groups.google.com/d/optout.
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >
> >       >       >       > --
> >       >       >       >
> >       >       >       > ---
> >       >       >       > You received this message because you are 
> > subscribed to the Google Groups "memcached" group.
> >       >       >       > To unsubscribe from this group and stop receiving 
> > emails from it, send an email to memcached+...@googlegroups.com.
> >       >       >       > For more options, visit 
> > https://groups.google.com/d/optout.
> >       >       >       >
> >       >       >       >
> >       >       >
> >       >       > --
> >       >       >
> >       >       > ---
> >       >       > You received this message because you are subscribed to the 
> > Google Groups "memcached" group.
> >       >       > To unsubscribe from this group and stop receiving emails 
> > from it, send an email to memcached+...@googlegroups.com.
> >       >       > For more options, visit https://groups.google.com/d/optout.
> >       >       >
> >       >       >
> >       >
> >       > --
> >       >
> >       > ---
> >       > You received this message because you are subscribed to the Google 
> > Groups "memcached" group.
> >       > To unsubscribe from this group and stop receiving emails from it, 
> > send an email to memcached+...@googlegroups.com.
> >       > For more options, visit https://groups.google.com/d/optout.
> >       >
> >       >
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to memcached+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Idea for reclaimation algo

Reply via email to