[PERFORM] strange estimate for number of rows

Andrew Sullivan Thu, 13 Nov 2003 11:20:17 -0800

Hi all,

I've one here that I cannot fathom.  Any suggestions?


We have a table, call it tablename, where we're selecting by a range
of dates and an identifier.  (This is redacted, obviously):

\d tablename

       Column       |           Type           |     Modifiers      
--------------------+--------------------------+--------------------
 id                 | integer                  | not null
 transaction_date   | timestamp with time zone | not null
 product_id         | integer                  | not null
Indexes:
    "trans_posted_trans_date_idx" btree (transaction_date, product_id)


The statistics on transaction_date and product_id are set to 1000. 
Everything is all analysed nicely.  But I'm getting a poor plan,
because of an estimate that the number of rows to be returned is
about double how many actually are:

explain analyse select * from transactions_posted where
transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
product_id = 2;

QUERY PLAN                                                                             
          
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on transactions_posted  (cost=0.00..376630.33 rows=700923
width=91) (actual time=8422.253..36176.078 rows=316029 loops=1)
   Filter: ((transaction_date >= '2003-09-01 00:00:00-04'::timestamp
with time zone) AND (transaction_date < '2003-10-01
00:00:00-04'::timestamp with time zone) AND (product_id = 2))
 Total runtime: 36357.630 ms
(3 rows)

SET enable_seqscan = off;

explain analyse select * from transactions_posted where
transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
product_id = 2;

QUERY PLAN                                                                             
            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using trans_posted_trans_date_idx on transactions_posted
(cost=0.00..1088862.56 rows=700923 width=91) (actual
time=35.214..14816.257 rows=316029 loops=1)
   Index Cond: ((transaction_date >= '2003-09-01
00:00:00-04'::timestamp with time zone) AND (transaction_date <
'2003-10-01 00:00:00-04'::timestamp with time zone) AND (product_id =
2))
 Total runtime: 15009.816 ms
(3 rows)

SELECT attname,null_frac,avg_width,n_distinct,correlation FROM
pg_stats where tablename = 'transactions_posted' AND attname in
('transaction_date','product_id');
     attname      | null_frac | avg_width | n_distinct | correlation 
------------------+-----------+-----------+------------+-------------
 product_id       |         0 |         4 |          2 |    0.200956
 transaction_date |         0 |         8 |  -0.200791 |    0.289248

Any ideas?  I'm loathe to recommend cluster, since the data will not
stay clustered.

A


-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<[EMAIL PROTECTED]>                              M2P 2A8
                                         +1 416 646 3304 x110


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

[PERFORM] strange estimate for number of rows

Reply via email to