[GENERAL] Indexes on NULL's and order by ... limit N queries

Maxim Boguk Mon, 01 Dec 2008 04:39:57 -0800

When i read about 8.3 support indexed queries on NULL values (like rubric_id is 
NULL) i was really happy.


But reality strike again... look like NULL in WHERE don't allow effective using
index on (rubric_id, pos) for queries like:
... WHERE rubric_id IS NULL ORDER BY pos LIMIT 5


Here is some details about my issue (all tests on fresh loaded/analyzed into 
empty 8.3.5 DB):


mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id=8 order by pos limit 5;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1337.02..1337.03 rows=5 width=28) (actual time=27.556..27.575 
rows=5 loops=1)
   ->  Sort  (cost=1337.02..1340.77 rows=1501 width=28) (actual 
time=27.552..27.558 rows=5 loops=1)
         Sort Key: pos
         Sort Method:  top-N heapsort  Memory: 25kB
         ->  Seq Scan on cluster_weight  (cost=0.00..1312.09 rows=1501 
width=28) (actual time=0.058..25.008 rows=1501 loops=1)
               Filter: (rubric_id = 8)
 Total runtime: 27.638 ms
(7 rows)

ok so we need index on (rubric_id, pos), lets add it:


mboguk_billing=# CREATE INDEX cluster_weight_2 on cluster_weight(rubric_id, 
pos);
CREATE INDEX


And try again:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id=8 order by pos limit 5;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1.70 rows=5 width=28) (actual time=0.095..0.122 rows=5 
loops=1)
   ->  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..509.31 
rows=1501 width=28) (actual time=0.090..0.104 rows=5 loops=1)
         Index Cond: (rubric_id = 8)
 Total runtime: 0.176 ms
(4 rows)

Ok... so now query works as intended...

Lets check are index used on search NULL values:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id IS NULL;
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1034.21 
rows=26435 width=28) (actual time=0.053..48.123 rows=26435 loops=1)
   Index Cond: (rubric_id IS NULL)
 Total runtime: 85.210 ms
(3 rows)

Yes it is working...

Now lets try main query over NULL:

mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id IS NULL ORDER BY pos LIMIT 5;
                                                                        QUERY 
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1473.29..1473.30 rows=5 width=28) (actual time=92.220..92.239 
rows=5 loops=1)
   ->  Sort  (cost=1473.29..1539.37 rows=26435 width=28) (actual 
time=92.216..92.223 rows=5 loops=1)
         Sort Key: pos
         Sort Method:  top-N heapsort  Memory: 25kB
         ->  Index Scan using cluster_weight_2 on cluster_weight  
(cost=0.00..1034.21 rows=26435 width=28) (actual time=0.033..47.333 rows=26435 
loops=1)
               Index Cond: (rubric_id IS NULL)
 Total runtime: 92.310 ms
(7 rows)

Ooops... that is surprise...
I wasn't ready see that plan here... and performance difference over 1000.

Looks like when indexed search over NULL's added, planner wasn't learned right way use such index on "where something is NULL order by ... limit ..."queries.



Thanks for any responses and sorry for not so good English.

--
SY, Maxim Boguk

--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Indexes on NULL's and order by ... limit N queries

Reply via email to