Re: std.allocator: FreeList uses simple statistics to control number of items
On 5/22/15 9:26 AM, Martin Nowak wrote: A one-pole low pass filter is a very efficient moving average. https://github.com/D-Programming-Language/druntime/blob/6e55b7aaff7566d374c2f253f831d3489e7fd1a5/src/gc/gc.d#L1732 Feed it with 1 on hit and 0 on miss to get Phit on average. On 5/22/15 9:19 AM, Martin Nowak wrote: You can look at jemalloc which has a GC algorithm for this problem. https://m.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919 These are both solid. Thanks!! -- Andrei
Re: std.allocator: FreeList uses simple statistics to control number of items
On Wednesday, 20 May 2015 at 17:28:50 UTC, Andrei Alexandrescu wrote: 1. How to update Pmiss efficiently? Most allocations should be fast, so it shouldn't be update with every call to allocate(). What I have now is I update every K calls. A one-pole low pass filter is a very efficient moving average. https://github.com/D-Programming-Language/druntime/blob/6e55b7aaff7566d374c2f253f831d3489e7fd1a5/src/gc/gc.d#L1732 Feed it with 1 on hit and 0 on miss to get Phit on average.
Re: std.allocator: FreeList uses simple statistics to control number of items
On Wednesday, 20 May 2015 at 17:28:50 UTC, Andrei Alexandrescu wrote: There is a bothersome issue with freelists fronting general-purpose allocators (https://en.wikipedia.org/wiki/Free_list): they can grow indefinitely. Because they keep memory allocated in their parent, they cause fragmentation to it. You can look at jemalloc which has a GC algorithm for this problem. https://m.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919
Re: std.allocator: FreeList uses simple statistics to control number of items
On 5/21/15 10:18 AM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" \"" wrote: On Thursday, 21 May 2015 at 15:17:03 UTC, Andrei Alexandrescu wrote: [snip] Could something like this work for the statistics part? enum AdaptabilityWindow= 256; enum AdaptabilitySpeed= 2; // log2 of rescaling divisor if (hit) ++hitCount; else ++missCount; if ((hitCount + missCount) >= Adaptability) { hitCount >>= AdaptabilitySpeed; missCount >>= AdaptabilitySpeed; } // hitCount/(hitCount + missCount) and missCount / (hitCount + missCount) are a decent approximation of p(hit) and p(miss) given recent history // if you can make decisions just by comparing them then you don't have to perform expensive divisions This is based on re-scaling frequencies, similar to what you'd find in model-based compression algorithms. With it you could try to always keep a minimum reserve of blocks just in case, and when freeing a block of fit size make the decision about keeping it in the free-list if the tendency or returning it to the parent if the tendency is to allocate bad sizes. To incrementally reduce the big free-list, you could just reduce the free-list size by a factor proportional to p(miss) - p(hit) Yah, I think that's a good idea. The size of the generated code was getting worrisome. Thanks! -- Andrei
Re: std.allocator: FreeList uses simple statistics to control number of items
On Thursday, 21 May 2015 at 15:17:03 UTC, Andrei Alexandrescu wrote: [snip] Could something like this work for the statistics part? enum AdaptabilityWindow = 256; enum AdaptabilitySpeed = 2; // log2 of rescaling divisor if (hit) ++hitCount; else ++missCount; if ((hitCount + missCount) >= Adaptability) { hitCount >>= AdaptabilitySpeed; missCount >>= AdaptabilitySpeed; } // hitCount/(hitCount + missCount) and missCount / (hitCount + missCount) are a decent approximation of p(hit) and p(miss) given recent history // if you can make decisions just by comparing them then you don't have to perform expensive divisions This is based on re-scaling frequencies, similar to what you'd find in model-based compression algorithms. With it you could try to always keep a minimum reserve of blocks just in case, and when freeing a block of fit size make the decision about keeping it in the free-list if the tendency or returning it to the parent if the tendency is to allocate bad sizes. To incrementally reduce the big free-list, you could just reduce the free-list size by a factor proportional to p(miss) - p(hit)
Re: std.allocator: FreeList uses simple statistics to control number of items
On 5/20/15 11:05 PM, Morbid.Obesity wrote: [snip] Thanks for sharing your insights! -- Andrei
Re: std.allocator: FreeList uses simple statistics to control number of items
On Wednesday, 20 May 2015 at 17:28:50 UTC, Andrei Alexandrescu wrote: 1. How to update Pmiss efficiently? Most allocations should be fast, so it shouldn't be update with every call to allocate(). What I have now is I update every K calls. Have you measured whether it actually makes a difference? I would expect memory access latency to dominate, at least in some scenarios.
Re: std.allocator: FreeList uses simple statistics to control number of items
On Wednesday, 20 May 2015 at 17:28:50 UTC, Andrei Alexandrescu wrote: https://github.com/andralex/phobos/commit/90120fc290bc7840ffbee22766798518b3418e15 There is a bothersome issue with freelists fronting general-purpose allocators (https://en.wikipedia.org/wiki/Free_list): they can grow indefinitely. Because they keep memory allocated in their parent, they cause fragmentation to it. The worst pattern for a freelist is: an application allocates a bunch of blocks of the specific size the freelist is tuned for, then frees most of them and proceeds with allocating other sizes. In that case the large freelist is essentially leaked. My past FreeList implementation had a maximum list length as a parameter. That was quite a primitive way to go about it. But what I want is a way to model the application's behavior and automatically adapt to the allocation patterns. In particular, if demand for blocks of the freelist size dwindles, it should be possible for the freelist length to progressively go down to zero. So there are three possible allocation events: 1. The request is for a fit size and the free list is empty. That's a miss. It will be served from the parent allocator, and upon freeing the block will be added to the free list. 2. The request is for a fit size and the free list is not empty. That's a hit - nice. Serve it from the free list. 3. The request is for an unfit size. That's always served from the parent allocator, and returned to it upon deallocation. What we want is to minimize the likelihood of a miss. Say over the past N allocation requests we have N1, N2, and N3 counts for the three events. Then we can estimate the probability of a miss from these past events: Pmiss = N1 / N Trouble is, all counts keep increasing and there is little adaptation to changing patterns - the stats are averaged over the entire lifetime. It's better to keep stats over a fixed-size rolling window, say the past Nw events. In that case, if a new miss event occurs, we update the estimated miss probability as such: Pmiss = (Pmiss * Nw + 1) / (Nw + 1) That is, in the recent past we've seen Pmiss * Nw misses, now we see one more (+ 1), and we divide everything by the increased number of events. In case of a non-miss (hit or unfit), the update is: Pmiss = Pmiss * Nw / (Nw + 1) Note how the first equation converges to Pmiss = 1 and the second to Pmiss = 0. Nice. In a batch of events Nbatch of which there were Nmiss misses, the update is: Pmiss = (Pmiss * Nw + Nmiss) / (Nw + Nbatch) So after each allocation event, or after a few of them, we have a nice estimate of the probability we're missing the opportunity of allocating off the freelist. What I'm less sure about is how to efficiently wire that information into a feedback loop. Specifically: 1. How to update Pmiss efficiently? Most allocations should be fast, so it shouldn't be update with every call to allocate(). What I have now is I update every K calls. Could you simply not use this in a thread that simply informs quickly that the free list needs to change it's optimization strategy? In the free list allocator, you simply check a flag allocator() { if (needs_size_u) { do whatever } } The threaded optimization routine can run as frequency as one ones and even the frequency could be based on the recent allocation's per second. (since higher allocations per second fragment the list faster, optimization would be needed more than when allocations are low(why keep trying to optimize the list when no allocations are made?). The thread could do such things as defragment the free list in the GC stop the world cycle if it is ever called. If one uses the GC, the free list could piggyback on it's fragmentation routine to clean itself up). 2. How and when should the estimated Pmiss be used to reduce the size of the freelist? What I do now is when Pmiss is below a fixed threshold, I just periodically yank one element off of it. A. When pmiss is large there are more misses and you can yank in "proportion". e.g., addcount = floor(C*(Pmiss^n-1/2)) here we simply fit scale Pmiss(or Phit) into an inter that represents the count to add or yank(if addcount < 0 we yank, which is why we subtract 1/2). C and n are constants that give us our range and "intensity". They can also be optimized, such as over the longer term use of the program(e.g. over hours of using the program C and n slowly change to try and minimize Pmiss. this is a way to actually have the program optimize itself to the specific way the allocations work in context(both programming and with the user specific behavior). e.g., Pmiss = 0, addcount = floor(-C/2). so C represents the maximum count we want to yank. Pmiss = 1, addcoun t= floor(C/2) and hence C represents the maximum to add. (it happens to be symmetric in this case but does not need to be, e.g., one could tailor the algorithms so they ada
Re: std.allocator: FreeList uses simple statistics to control number of items
What you could do is calculate the average allocation size and std deviantions in a moving window, and the z-score for each freelist and use this lookup table: https://www.stat.tamu.edu/~lzhou/stat302/standardnormaltable.pdf If P < 0.10 (maybe use this as a setting) this means the probability of the next allocations going through this freelist is too low and you can decide to tighten the freelist and let the deallocations fall through to the underlying allocator.