Igniters,

I heard some complaints about our page replacement algorithm that index
pages could be evicted from memory too often. I reviewed our current
implementation and looks like we have choosen very simple approach with
eviction of random pages, without taking in count their nature (data vs
index) and typical usage patterns (such as scans).

With our heap-based architecture typical SQL query is executed as follows:
1) Read non-leaf index pages, then in loop:
2.1) Read 1 leaf index page
2.2) Read several hunderds data pages

This way index pages on average has smaller timestamp than data pages and
has good probabilty of being evicted.

Another major problem is scan resistance, which doesn't seem to be covered
anyhow.

My question is - what was the reason of choosing random-pseudo-LRU
algorithm instead of commonly used variation of *real* LRU (such as LRU-K,
2Q, etc)? Did we perform any evaluation of it's effectiveness?

I am thinking of creating new IEP to evaluate and possibly improve our page
replacement as follows:
1) Implement metrics to count page cache hit/miss by page type [1]
2) Implement *heat map* which can optionally be enabled to track page
hits/misses per page or per specific object (cache, index)
3) Run heat map on typical workloads (lookups, scans, joins, etc) to get a
baseline
4) Prototype several LRU-based implementation and see if they gave any
benefit. It makes sense to start with minor improvements to current
algorithm (e.g. favor index pages over data pages, play with sample size,
replace timestamps with read counters, etc).

In any case the first two action items would be good addition to product
monitoring.

What do you think?

[1] https://issues.apache.org/jira/browse/IGNITE-8580

Reply via email to