Short summary:

  * It looks to me like the planner vastly overestimates 
    the # of pages read by index scan in quite a few of my
    tables even though stats collected by ANALYZE are correct.

  * The problem happens any time you have multiple columns
    that have a number of repeated values in them, and
    you CLUSTER the table by a sort using both columns
    (like "city,state,zip,phone#" or "firstname,lastname").

  * I think this is the problem that Mark Kirkwood is seeing
    in his threads Query optimizer 8.0.1 and "One Big trend
    vs multiple smaller trends" in hackers.

  * A test script demonstrating the issue also follows.

  * I think keeping one more stat per attribute in
    pg_stastic that could describe this behavior.


Longer:  


  If I understand the optimizer correctly,  correlation is used
  to both guess how much random disk access will be required in
  a query; as well as estimate how many pages will be read.

  Unfortunately, many tables in my larger databases have
  columns with values that are tightly packed on a few pages;
  even though there is no total-ordering across the whole table.
  Stephan Szabo described this as a "clumping effect":
  http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php


  The test script below shows a table with 6 columns and 
  1 million rows.

  Note that ANALYZE correctly observes that the correlation
  for all the columns except "A" is near zero.

  However, also note that columns A,B,C,and even D will have
  extremely few UNIQUE values on any given page of data.  And
  conversely, an index scan on any of those columns be a good
  choice (as seen by EXPLAIN ANALYZE output below).


  Instead of just storing "correlation",  I would also like
  to store a variation of "correlation" that "pretends" that
  the disk-pages for a particular column were sorted by the
  min value for that column.  Wherever the optimizer would
  use the existing 'correlation' to estimate the number
  of pages that would be accesses; it could use this 
  "correlation discounting the order of blocks" value instead.
  All the math/stastics/theory that suggests correlation is
  good at estimating # of pages would remain intact.
  Anyway... I've talked too much...

The test script showing 
   1) the tables
   2) the values in pg_stats
   3) EXPLAIN ANALYZE of columns showing the problem
follows:


********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************

fli=# create temporary table tmp1mil as
      select * from
             (select generate_series as a from generate_series(0,9)) as a,
             (select generate_series as b from generate_series(0,9)) as b,
             (select generate_series as c from generate_series(0,9)) as c,
             (select generate_series as d from generate_series(0,9)) as d,
             (select generate_series as e from generate_series(0,9)) as e,
             (select generate_series as f from generate_series(0,9)) as f
      order by a,b,c,d,e,f;
fli-# fli-# fli-# fli-# fli-# fli-# fli-# fli-# 
SELECT
fli=# fli=# vacuum analyze tmp1mil;
VACUUM
fli=# select * from pg_stats where tablename='tmp1mil';
 schemaname | tablename | attname | null_frac | avg_width | n_distinct |   
most_common_vals    |                                       most_common_freqs   
                                     | histogram_bounds | correlation 
------------+-----------+---------+-----------+-----------+------------+-----------------------+------------------------------------------------------------------------------------------------+------------------+-------------
 pg_temp_4  | tmp1mil   | a       |         0 |         4 |         10 | 
{7,8,5,0,6,3,4,1,2,9} | 
{0.113,0.106667,0.105667,0.104333,0.101333,0.097,0.095,0.0936667,0.092,0.0913333}
              |                  |           1
 pg_temp_4  | tmp1mil   | b       |         0 |         4 |         10 | 
{1,4,0,5,6,2,9,7,3,8} | 
{0.119333,0.112667,0.107,0.101,0.099,0.0976667,0.0946667,0.092,0.0886667,0.088} 
               |                  |    0.229754
 pg_temp_4  | tmp1mil   | c       |         0 |         4 |         10 | 
{9,5,0,1,8,6,4,7,3,2} | 
{0.114667,0.107,0.103,0.101667,0.101667,0.0996667,0.0956667,0.094,0.0933333,0.0893333}
         |                  |    0.142119
 pg_temp_4  | tmp1mil   | d       |         0 |         4 |         10 | 
{4,3,2,8,7,9,5,6,1,0} | 
{0.114667,0.11,0.108,0.104,0.102667,0.099,0.098,0.0903333,0.0893333,0.084}      
               |                  |   0.0930835
 pg_temp_4  | tmp1mil   | e       |         0 |         4 |         10 | 
{0,5,1,9,4,7,8,2,3,6} | 
{0.109333,0.109333,0.108333,0.100667,0.100333,0.0983333,0.0966667,0.0936667,0.092,0.0913333}
   |                  |    0.138575
 pg_temp_4  | tmp1mil   | f       |         0 |         4 |         10 | 
{2,8,6,0,3,5,9,1,7,4} | 
{0.104667,0.103667,0.102667,0.102333,0.101667,0.100667,0.100667,0.0986667,0.0943333,0.0906667}
 |                  |    0.139991
(6 rows)

fli=# create index tmp1mil__c on tmp1mil(c);
CREATE INDEX
fli=# explain analyze select * from tmp1mil where c=5;
                                                    QUERY PLAN                  
                                  
------------------------------------------------------------------------------------------------------------------
 Seq Scan on tmp1mil  (cost=0.00..19853.00 rows=107001 width=24) (actual 
time=2.164..480.863 rows=100000 loops=1)
   Filter: (c = 5)
 Total runtime: 531.345 ms
(3 rows)

fli=# explain analyze select * from tmp1mil where c=5;
                                                    QUERY PLAN                  
                                  
------------------------------------------------------------------------------------------------------------------
 Seq Scan on tmp1mil  (cost=0.00..19853.00 rows=107001 width=24) (actual 
time=2.191..481.849 rows=100000 loops=1)
   Filter: (c = 5)
 Total runtime: 531.858 ms
(3 rows)


fli=# set enable_seqscan=false;
SET
fli=# explain analyze select * from tmp1mil where c=5;
                                                              QUERY PLAN        
                                                      
--------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using tmp1mil__c on tmp1mil  (cost=0.00..364309.19 rows=107001 
width=24) (actual time=0.102..101.460 rows=100000 loops=1)
   Index Cond: (c = 5)
 Total runtime: 152.103 ms
(3 rows)

fli=# explain analyze select * from tmp1mil where c=5;
                                                              QUERY PLAN        
                                                      
--------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using tmp1mil__c on tmp1mil  (cost=0.00..364309.19 rows=107001 
width=24) (actual time=0.082..101.298 rows=100000 loops=1)
   Index Cond: (c = 5)
 Total runtime: 151.914 ms
(3 rows)

fli=# 

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to