Hi there, I am trying to optimize a simple query that returns first 100 rows that have been updated since a given timestamp (ordered by timestamp and id desc). If there are several rows with the same timestamp I need to a second condition, that states that I want to return rows having the given timestamp and id > given id.
The obvious query is SELECT * FROM register_uz_accounting_entities > WHERE effective_on > '2014-07-11' OR (effective_on = '2014-07-11' AND > id > 1459) > ORDER BY effective_on, id > LIMIT 100 With a composite index on (effective_on, id) Query plan "Limit (cost=4613.70..4613.95 rows=100 width=1250) (actual > time=0.122..0.130 rows=22 loops=1)" > " Buffers: shared hit=28" > " -> Sort (cost=4613.70..4617.33 rows=1453 width=1250) (actual > time=0.120..0.122 rows=22 loops=1)" > " Sort Key: effective_on, id" > " Sort Method: quicksort Memory: 30kB" > " Buffers: shared hit=28" > " -> Bitmap Heap Scan on register_uz_accounting_entities > (cost=35.42..4558.17 rows=1453 width=1250) (actual time=0.036..0.083 > rows=22 loops=1)" > " Recheck Cond: ((effective_on > '2014-07-11'::date) OR > ((effective_on = '2014-07-11'::date) AND (id > 1459)))" > " Buffers: shared hit=28" > " -> BitmapOr (cost=35.42..35.42 rows=1453 width=0) (actual > time=0.026..0.026 rows=0 loops=1)" > " Buffers: shared hit=6" > " -> Bitmap Index Scan on idx2 (cost=0.00..6.49 > rows=275 width=0) (actual time=0.017..0.017 rows=15 loops=1)" > " Index Cond: (effective_on > '2014-07-11'::date)" > " Buffers: shared hit=3" > " -> Bitmap Index Scan on idx2 (cost=0.00..28.21 > rows=1178 width=0) (actual time=0.007..0.007 rows=7 loops=1)" > " Index Cond: ((effective_on = > '2014-07-11'::date) AND (id > 1459))" > " Buffers: shared hit=3" > "Total runtime: 0.204 ms" Everything works as expected. However if I change the constraint to a timestamp/date more in the past (resulting in far more matching rows) the query slows down drastically. > > SELECT * FROM register_uz_accounting_entities > WHERE effective_on > '2014-06-11' OR (effective_on = '2014-06-11' AND id > > 1459) > ORDER BY effective_on, id > LIMIT 100 > > "Limit (cost=0.42..649.46 rows=100 width=1250) (actual > time=516.125..516.242 rows=100 loops=1)" > " Buffers: shared hit=576201" > " -> Index Scan using idx2 on register_uz_accounting_entities > (cost=0.42..106006.95 rows=16333 width=1250) (actual time=516.122..516.229 > rows=100 loops=1)" > " Filter: ((effective_on > '2014-06-11'::date) OR ((effective_on = > '2014-06-11'::date) AND (id > 1459)))" > " Rows Removed by Filter: 797708" > " Buffers: shared hit=576201" > "Total runtime: 516.304 ms" I've tried to optimize this query by pushing down the limit and order by's into explicit subselects. SELECT * FROM ( > SELECT * FROM register_uz_accounting_entities > WHERE effective_on > '2014-06-11' > ORDER BY effective_on, id LIMIT 100 > ) t1 > UNION > SELECT * FROM ( > SELECT * FROM register_uz_accounting_entities > WHERE effective_on = '2014-06-11' AND id > 1459 > ORDER BY effective_on, id LIMIT 100 > ) t2 > ORDER BY effective_on, id > LIMIT 100 > > -- query plan > "Limit (cost=684.29..684.54 rows=100 width=1250) (actual > time=2.648..2.708 rows=100 loops=1)" > " Buffers: shared hit=203" > " -> Sort (cost=684.29..684.79 rows=200 width=1250) (actual > time=2.646..2.672 rows=100 loops=1)" > " Sort Key: register_uz_accounting_entities.effective_on, > register_uz_accounting_entities.id" > " Sort Method: quicksort Memory: 79kB" > " Buffers: shared hit=203" > " -> HashAggregate (cost=674.65..676.65 rows=200 width=1250) > (actual time=1.738..1.971 rows=200 loops=1)" > " Buffers: shared hit=203" > " -> Append (cost=0.42..661.15 rows=200 width=1250) (actual > time=0.054..0.601 rows=200 loops=1)" > " Buffers: shared hit=203" > " -> Limit (cost=0.42..338.62 rows=100 width=1250) > (actual time=0.053..0.293 rows=100 loops=1)" > " Buffers: shared hit=101" > " -> Index Scan using idx2 on > register_uz_accounting_entities (cost=0.42..22669.36 rows=6703 width=1250) > (actual time=0.052..0.260 rows=100 loops=1)" > " Index Cond: (effective_on > > '2014-06-11'::date)" > " Buffers: shared hit=101" > " -> Limit (cost=0.42..318.53 rows=100 width=1250) > (actual time=0.037..0.228 rows=100 loops=1)" > " Buffers: shared hit=102" > " -> Index Scan using idx2 on > register_uz_accounting_entities register_uz_accounting_entities_1 > (cost=0.42..30888.88 rows=9710 width=1250) (actual time=0.036..0.187 > rows=100 loops=1)" > " Index Cond: ((effective_on = > '2014-06-11'::date) AND (id > 1459))" > " Buffers: shared hit=102" > "Total runtime: 3.011 ms" => Very fast. The question is... why is the query planner unable to make this optimization for the slow query? What am I missing? Queries with syntax highlighting https://gist.github.com/jsuchal/0993fd5a2bfe8e7242d1 Thanks in advance.