I ran into a query that shows a performance regression related to the
Memoize node.

create table t (a int, b int, c int);
insert into t select i%3, i, i from generate_series(1,500)i;
analyze t;

explain (analyze, costs off, timing off)
select * from t t1 join lateral
  (select t2.a, t2.b, t1.a x from t t2, t t3 offset 0) s
  on s.a = t1.a;

with enable_memoize set to on:

 Planning Time: 2.470 ms
 Execution Time: 98869.240 ms

with enable_memoize set to off:

 Planning Time: 1.791 ms
 Execution Time: 55754.080 ms

This shows a 77.3% performance regression with Memoize enabled.

The stats of the Memoize node shows some clues:

   ->  Memoize (actual rows=83334.00 loops=500)
         Cache Key: t1.a
         Cache Mode: binary
         Hits: 0  Misses: 500  Evictions: 498  Overflows: 0  Memory
Usage: 8193kB

There are 0 cache hits, and too many cache evictions.

So I suspect that during the phase of filling the Memoize cache, the
memory usage exceeds the specified limit, causing cache entries to be
repeatedly evicted.

While cost_memoize_rescan() does account for the eviction ratio when
estimating the cost of Memoize, the estimate does not seem to be
accurate enough in this case to prevent the planner from choosing a
Memoize node.

Any thoughts on how we might improve this?

Thanks
Richard


Reply via email to