On Wed, Feb 15, 2017 at 9:06 AM, Alexey Goncharuk < alexey.goncha...@gmail.com> wrote:
> Dmitriy, > > For the current off-heap approach, we only have a per-entry LRU eviction > policy which does not fit well page memory architecture. I like the > adaptation of clock pro algorithm because it requires a significantly > smaller amount of memory to track page updates and is scan-resistant. > Agree. > > I want to back Ivan here and discuss whether we need synchronized evictions > at all. Given that current implementation has flaws and correct > implementation should work with the same protocol guarantees as a cache > update (meaning two-phase commit for transactional caches on every > operation, including reads). I would rather drop synchronous evictions at > all. > > Thoughts? > Agree again. Synchronous evictions only make sense in pure in-memory approach, where there is no database at all. AFAIK, most of the Ignite deployments are backed by a database. If not, then users should control the evictions themselves. However, I am always concerned when removing a certain feature. If we start getting complaints, we may have to add it in 2.0. > > 2017-02-15 0:59 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > > > Ivan, thanks for the detailed explanation. My preference would be to > > support all of the suggested eviction policies and control them from the > > configuration. However, it does sound the K-random-LRU or K-randome-LRU-2 > > will be much easier to support than LIRs. > > > > Don't we already have something like K-random-LRU in place? > > > > D. > > > > On Tue, Feb 14, 2017 at 9:13 AM, Ivan Rakov <ivan.glu...@gmail.com> > wrote: > > > > > Hello Igniters, > > > > > > I want to share some thoughts about supporting eviction on new page > > memory > > > storage. > > > > > > I think, we should reconsider eviction policies in their current state. > > > Current pluggable eviction policy mechanism allows to determine which > key > > > will be evicted from the cache and when eviction should be performed. > > There > > > are two issues about it: > > > > > > 1) Such mechanism tends to make off-heap storage highly fragmented: > batch > > > of small evicted entries can accumulate needed total amount of free > space > > > in FreeList, but it still would be impossible to store one big entry. > > > > > > 2) Growth of cache size implies creating O(n) heap objects and links to > > > maintain eviction policy, which causes linear growth of GC load. That > > > doesn't suit us: getting rid of GC overhead was one of the main reasons > > for > > > keeping cache data in off-heap memory. > > > > > > I propose to use "whole page eviction, off-heap eviction metadata" as > > > default mode for eviction. It's almost non-deterministic, because > actual > > > eviction result depends on how data is packed into pages, but is more > > > efficient, and shouldn't cause long-term performance degradation due to > > > fragmentation. > > > > > > I can suggest two ways to implement it: > > > > > > 1) *K-random-LRU > > > *The idea is to allocate big off-heap array, to track timestamp of last > > > usage for each data page. > > > When data page on address X is accessed, we store current timestamp in > X > > / > > > PAGE_SIZE array position. > > > When there's a need for eviction, we randomly choose K indices of array > > > (if some of indices point to non-data pages, re-choose them) and evict > > data > > > page with minimum timestamp. In case K=5, we'll evict page from 17% > least > > > recently used set with 50% probability <http://math.stackexchange.com > > > /questions/786392/expectation-of-minimum-of-n-i-i-d-uniform- > > > random-variables>. > > > Non-data pages can be simply recognized by having zero in corresponding > > > position of timestamp array. > > > If entry is too big and occupies more than one page, only first page > will > > > be tracked, and other will be considered as non-data pages. > > > *K-random-LRU-2* is perspective variant of this approach. We'll evict > > page > > > with minimum time of penultimate access. Implementation is mostly the > > same, > > > the difference is that we should store two timestamps for each data > page. > > > LRU-2 outperforms <http://www.cs.cmu.edu/%7Echri > > > stos/courses/721-resources/p297-o_neil.pdf> LRU by resolving "one-hit > > > wonder" problem: if page is accessed very rarely, but accidentally > > accessed > > > once, it's "protected" from eviction for long time. > > > Memory overhead: timestamp can be packed into 4 bytes. 100GB off-heap > > > cache with standard 2KB pages requires 200MB of additional memory > (400MB > > in > > > case of LRU-2 strategy).* > > > * > > > > > > 2)*Adaptation of CLOCK-Pro > > > *CLOCK-Pro (article <https://www.usenix.org/legacy > > > /event/usenix05/tech/general/full_papers/jiang/jiang_html/html.html>, > > > presentation <http://www.slideshare.net/huliang64/clockpro>) is modern > > > page replacement algorithm, used in actual OS kernels. It is an > > > approximation of LIRS <http://web.cse.ohio-state.edu > > > /hpcs/WWW/HTML/publications/papers/TR-02-6.pdf> with significant > > > advantage for our case: instead of doubly linked list, which is hard to > > > maintain in off-heap memory, CLOCK-Pro uses only circular list and > three > > > pointers. However, we can't use CLOCK-Pro as is: there's difference > > between > > > OS paged memory and Ignite off-heap storage. In OS, if memory page is > > > swapped out, it still contains the same data, with same relevance and > > > patterns of access. That's why keeping metadata for swapped-out page > (see > > > Non-resident Cold Pages set) works. In Ignite, in case we evict page, > all > > > data is discarded; entries with same keys may appear in the cache > again, > > > but they will be evenly distributed across the whole storage. > > > I propose to use simplified version of CLOCK-Pro with only two pointers > > > and two page sets: hot and cold. Non-resident pages won't be tracked. > > > To implement it, we should allocate big off-heap array to track flags > for > > > each page. We should store (hot, cold) bit, (accessed, not accessed) > bit > > > and one bit indicating that the page is data page, totally 3 bits per > > page. > > > 100GB off-heap cache with standard 2KB pages requires ~20MB of > additional > > > memory, or 50MB in case we don't want to shrink metadata for multiple > > pages > > > in one byte. > > > Memory overhead is really small, but we can't predict average and > maximum > > > time for one eviction, it has to be discovered during an experiment (in > > > hypothetical worst case, clock hand can travel through the whole > array). > > > Also, hot/cold ratio can affect performance dramatically and should be > > > tuned well. > > > > > > Note about classic "key" eviction: fair per-entry policies (FIFO, LRU, > > > Sorted) may be useful for some users, and we may want to support them > as > > > well. It's no problem at all to implement them on-heap (more than, > > current > > > implementations of EvictionPolicy will work in if we make minor changes > > in > > > the code). In off-heap, FIFO is easy to implement (circular buffer), > but > > > LRU (and more effective LIRS, if someone wants it) require doubly > linked > > > lists. The only option I see is using our BPlusTree with some kind of > > > implicit key, but this implies O(log n) overhead on each entry read > > access. > > > > > > Another note: current GridCacheEvictionManager have eviction backup > > > synchronization mechanism. Do we want to support it for off-heap > storage > > as > > > well? > > > > > > Please let me know if my understanding of any algorithm or part of the > > > system is wrong. > > > > > > -- > > > Best Regards, > > > Ivan Rakov > > > > > > > > >