> The only problem is that I can't test that this actually would trigger the
> memory problem, because I can't force the plan to use the right join, it
> always reverts to the left join hashing the tmp_q:

I think the table on the "OUTER" side is the one which needs to be iterated
over (not hashed), in order to return each of its rows even if there are no
join partners in the other table.  In your original query, the small table was
being hashed and the large table iterated; maybe that's whats important.

> which is of course much better, but when tmp_q and tmp_r are the results of
> complex stuff that the planner can't estimate, then it gets it wrong, and
> then the issue gets triggered because we are hashing on the big tmp_r, not
> tmp_q.

I was able to get something maybe promising ?  "Batches: 65536 (originally 1)"

I didn't get "Out of memory" error yet, but did crash the server with this one:
postgres=# explain analyze WITH v AS( SELECT * FROM 
generate_series(1,99999999)i WHERE i%10<10 AND i%11<11 AND i%12<12 AND i%13<13 
AND i%14<14 AND i%15<15 AND i%16<16 AND i%17<17 AND i%18<18 AND i%19<19 AND 
i%20<20 AND i%21<21 ) SELECT * FROM generate_series(1,99)k JOIN v ON k=i ;

Note, on pg12dev this needs to be "with v AS MATERIALIZED".

postgres=# SET work_mem='128kB';SET client_min_messages =log;SET 
log_statement_stats=on;explain(analyze,timing off) WITH v AS( SELECT * FROM 
generate_series(1,999999)i WHERE i%10<10 AND i%11<11 AND i%12<12 AND i%13<13 
AND i%14<14 AND i%15<15 AND i%16<16 AND i%17<17 AND i%18<18 AND i%19<19 AND 
i%20<20 AND i%21<21 ) SELECT * FROM generate_series(1,99)k JOIN v ON k=i ;
 Hash Join  (cost=70.04..83.84 rows=5 width=8) (actual rows=99 loops=1)
   Hash Cond: (k.k = v.i)
   CTE v
     ->  Function Scan on generate_series i  (cost=0.00..70.00 rows=1 width=4) 
(actual rows=999999 loops=1)
           Filter: (((i % 10) < 10) AND ((i % 11) < 11) AND ((i % 12) < 12) AND 
((i % 13) < 13) AND ((i % 14) < 14) AND ((i % 15) < 15) AND ((i % 16) < 16) AND 
((i % 17) < 17) AND ((i % 18) < 18) AND ((i % 19) < 19) AND ((i % 20) < 20) AND 
((i % 21) < 21))
   ->  Function Scan on generate_series k  (cost=0.00..10.00 rows=1000 width=4) 
(actual rows=99 loops=1)
   ->  Hash  (cost=0.02..0.02 rows=1 width=4) (actual rows=999999 loops=1)
         Buckets: 4096 (originally 1024)  Batches: 512 (originally 1)  Memory 
Usage: 101kB
         ->  CTE Scan on v  (cost=0.00..0.02 rows=1 width=4) (actual 
rows=999999 loops=1)

Justin


Reply via email to