Hi David,

Early in this thread, Pavel suggested:

> you should partial index
> 
> create index foo(b) on mytable where a is null;

Rather, you might try the opposite partial index (where a is NOT null) as a 
replacement for the original unqualified index on column A.  This new index 
will be ignored by the query you're trying to tune, but it'll be available to 
the other queries that filter to a non-null value of column A.  (Omitting NULL 
from that index should be ok because you normally wouldn't want to use an index 
when 95% of the table's rows match the filtered key.)

Then you can temporarily disable Seq Scans in your session for just this one 
query, as follows:

SQL> create table my_table ( a int, b int ) ;
CREATE TABLE

SQL> create index idx_a_not_null on my_table ( a ) where a is not null ;
CREATE INDEX

SQL> create index idx_b on my_table ( b ) ;
CREATE INDEX

SQL> insert into my_table (a, b)
select
  case when random() <= 0.95 then null else i end as a,
  mod(i, 10) as b
from generate_series(1, 10000000) s(i)
;
INSERT 0 10000000

SQL> analyze my_table ;
ANALYZE


Review the statistics available to the optimizer:

SQL> select attname, null_frac, n_distinct, most_common_vals, 
most_common_freqs, histogram_bounds, correlation
from pg_stats
where tablename = 'my_table'
order by attname
;
 attname | null_frac | n_distinct |   most_common_vals    |                     
             most_common_freqs                                   |              
                      histogram_bounds                                    | 
correlation
---------+-----------+------------+-----------------------+--------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-------------
 a       |     0.945 |         -1 |                       |                     
                                                                 | 
{2771,1301755,2096051,3059786,3680728,4653531,5882434,6737141,8240245,9428702,9875768}
 |           1
 b       |         0 |         10 | {9,4,3,1,2,6,8,5,7,0} | 
{0.110333,0.104,0.102333,0.100333,0.100333,0.0996667,0.0986667,0.0983333,0.096,0.09}
 |                                                                              
          |    0.127294
(2 rows)

SQL> select relname, reltuples, relpages from pg_class where relname in 
('my_table', 'idx_a_not_null', 'idx_b') order by relname ;
    relname     | reltuples | relpages
----------------+-----------+----------
 idx_a_not_null |    499955 |     1100
 idx_b          |     1e+07 |    21946
 my_table       |     1e+07 |    39492
(3 rows)


Run the test query, first without disabling Seq Scan to show this example 
reproduces the plan you're trying to avoid.

SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.66 rows=15 width=8) (actual time=0.070..0.263 rows=15 
loops=1)
   ->  Seq Scan on my_table  (cost=0.00..164492.00 rows=929250 width=8) (actual 
time=0.061..0.159 rows=15 loops=1)
         Filter: ((a IS NULL) AND (b = 5))
 Total runtime: 0.371 ms
(4 rows)


Now run the same query without the Seq Scan option.

SQL> set enable_seqscan = false ;
SET

SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..46.33 rows=15 width=8) (actual time=0.081..0.232 rows=15 
loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..2869913.63 rows=929250 
width=8) (actual time=0.072..0.130 rows=15 loops=1)
         Index Cond: (b = 5)
         Filter: (a IS NULL)
 Total runtime: 0.341 ms
(5 rows)

SQL> reset enable_seqscan ;
RESET


Yes, it's unsavory to temporarily adjust a session-level parameter to tune a 
single query, but I don't know of a less intrusive way to avoid the SeqScan.  
Here's why I think it might be your simplest option:

As far as I can tell, the plan nodes for accessing the table/index are unaware 
of the LIMIT.  The cost of the Limit node is estimated as the cost of its input 
row-source multiplied by the ratio of requested/returned rows.  For example, 
from the preceding plan output:
    2869913.63 for "Index Scan" upper cost * (15 row limit / 929250 returned 
rows) = 46.326 upper cost for the "Limit" node
The underlying plan nodes each assume that all the rows matching their filter 
predicates will be returned up the pipeline; the cost estimate is only reduced 
at the Limit node.  A Seq Scan and an Index Scan (over a complete index) will 
both expected the same number of input rows (pg_class.reltuples).  They also 
produce the same estimated result set, since both apply the same filters before 
outputing rows to the next node.  So an Index Scan is always going to have a 
higher cost estimate than an equivalent Seq Scan returning the same result rows 
(unless random_page_cost is < 1).  That's why I think the planner is always 
preferring the plan that uses a Seq Scan.

Hope this helps!



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to