I ran into a query that shows a performance regression related to the Memoize node.
create table t (a int, b int, c int); insert into t select i%3, i, i from generate_series(1,500)i; analyze t; explain (analyze, costs off, timing off) select * from t t1 join lateral (select t2.a, t2.b, t1.a x from t t2, t t3 offset 0) s on s.a = t1.a; with enable_memoize set to on: Planning Time: 2.470 ms Execution Time: 98869.240 ms with enable_memoize set to off: Planning Time: 1.791 ms Execution Time: 55754.080 ms This shows a 77.3% performance regression with Memoize enabled. The stats of the Memoize node shows some clues: -> Memoize (actual rows=83334.00 loops=500) Cache Key: t1.a Cache Mode: binary Hits: 0 Misses: 500 Evictions: 498 Overflows: 0 Memory Usage: 8193kB There are 0 cache hits, and too many cache evictions. So I suspect that during the phase of filling the Memoize cache, the memory usage exceeds the specified limit, causing cache entries to be repeatedly evicted. While cost_memoize_rescan() does account for the eviction ratio when estimating the cost of Memoize, the estimate does not seem to be accurate enough in this case to prevent the planner from choosing a Memoize node. Any thoughts on how we might improve this? Thanks Richard