Igniters, I heard some complaints about our page replacement algorithm that index pages could be evicted from memory too often. I reviewed our current implementation and looks like we have choosen very simple approach with eviction of random pages, without taking in count their nature (data vs index) and typical usage patterns (such as scans).
With our heap-based architecture typical SQL query is executed as follows: 1) Read non-leaf index pages, then in loop: 2.1) Read 1 leaf index page 2.2) Read several hunderds data pages This way index pages on average has smaller timestamp than data pages and has good probabilty of being evicted. Another major problem is scan resistance, which doesn't seem to be covered anyhow. My question is - what was the reason of choosing random-pseudo-LRU algorithm instead of commonly used variation of *real* LRU (such as LRU-K, 2Q, etc)? Did we perform any evaluation of it's effectiveness? I am thinking of creating new IEP to evaluate and possibly improve our page replacement as follows: 1) Implement metrics to count page cache hit/miss by page type [1] 2) Implement *heat map* which can optionally be enabled to track page hits/misses per page or per specific object (cache, index) 3) Run heat map on typical workloads (lookups, scans, joins, etc) to get a baseline 4) Prototype several LRU-based implementation and see if they gave any benefit. It makes sense to start with minor improvements to current algorithm (e.g. favor index pages over data pages, play with sample size, replace timestamps with read counters, etc). In any case the first two action items would be good addition to product monitoring. What do you think? [1] https://issues.apache.org/jira/browse/IGNITE-8580