[HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

Merlin Moncure Tue, 21 Jun 2016 13:19:16 -0700

Hello hackers,

Observe the following test case (apologies if this is a well
understood problem):


create temp table foo as select generate_series(1,1000000) id;
create index on foo(id);

create temp table bar as select id, id % 100000 = 0 as good from
generate_series(1,1000000) id;
create index on bar(good);

analyze foo;
analyze bar;

explain analyze select * from foo where false or exists (select 1 from
bar where good and foo.id = bar.id);  -- A
explain analyze select * from foo where exists (select 1 from bar
where good and foo.id = bar.id);  -- B

These queries are trivially verified as identical but give very different plans.
A gives
                                                          QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Seq Scan on foo  (cost=0.00..4459425.00 rows=500000 width=4) (actual
time=13.299..130.271 rows=10 loops=1)
   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
   Rows Removed by Filter: 999990
   SubPlan 1
     ->  Index Scan using bar_good_idx on bar  (cost=0.42..4.45 rows=1
width=0) (never executed)
           Index Cond: (good = true)
           Filter: (good AND (foo.id = id))
   SubPlan 2
     ->  Index Scan using bar_good_idx on bar bar_1  (cost=0.42..4.44
rows=1 width=4) (actual time=0.024..0.055 rows=10 loops=1)
           Index Cond: (good = true)
           Filter: good
 Planning time: 0.103 ms
 Execution time: 130.312 ms

B gives
                                                          QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Nested Loop  (cost=4.87..12.91 rows=1 width=4) (actual
time=0.075..0.161 rows=10 loops=1)
   ->  HashAggregate  (cost=4.45..4.46 rows=1 width=4) (actual
time=0.058..0.060 rows=10 loops=1)
         Group Key: bar.id
         ->  Index Scan using bar_good_idx on bar  (cost=0.42..4.44
rows=1 width=4) (actual time=0.018..0.045 rows=10 loops=1)
               Index Cond: (good = true)
               Filter: good
   ->  Index Only Scan using foo_id_idx on foo  (cost=0.42..8.44
rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=10)
         Index Cond: (id = bar.id)
         Heap Fetches: 10
 Planning time: 0.193 ms
 Execution time: 0.187 ms

This is a general problem to OR expressions while AND expressions will
generally pass the optimization through.   The 'old school'
optimization approach is to rewrite the OR expressions to UNION ALL
but this can have unpleasant downstream effects on the query in real
world scenarios.  The question is: can the one time filter logic be
expanded such the first query can be functionally be written into the
second one?  This type of query happens a lot when trying to mix
multiple different filtering expressions (a 'filter mode' if you will)
in a single query based on a user supplied switch.  Food for thought.

merlin

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

Reply via email to