Hi all,
I've one here that I cannot fathom. Any suggestions?
We have a table, call it tablename, where we're selecting by a range
of dates and an identifier. (This is redacted, obviously):
\d tablename
Column | Type | Modifiers
--------------------+--------------------------+--------------------
id | integer | not null
transaction_date | timestamp with time zone | not null
product_id | integer | not null
Indexes:
"trans_posted_trans_date_idx" btree (transaction_date, product_id)
The statistics on transaction_date and product_id are set to 1000.
Everything is all analysed nicely. But I'm getting a poor plan,
because of an estimate that the number of rows to be returned is
about double how many actually are:
explain analyse select * from transactions_posted where
transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
product_id = 2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on transactions_posted (cost=0.00..376630.33 rows=700923
width=91) (actual time=8422.253..36176.078 rows=316029 loops=1)
Filter: ((transaction_date >= '2003-09-01 00:00:00-04'::timestamp
with time zone) AND (transaction_date < '2003-10-01
00:00:00-04'::timestamp with time zone) AND (product_id = 2))
Total runtime: 36357.630 ms
(3 rows)
SET enable_seqscan = off;
explain analyse select * from transactions_posted where
transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
product_id = 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using trans_posted_trans_date_idx on transactions_posted
(cost=0.00..1088862.56 rows=700923 width=91) (actual
time=35.214..14816.257 rows=316029 loops=1)
Index Cond: ((transaction_date >= '2003-09-01
00:00:00-04'::timestamp with time zone) AND (transaction_date <
'2003-10-01 00:00:00-04'::timestamp with time zone) AND (product_id =
2))
Total runtime: 15009.816 ms
(3 rows)
SELECT attname,null_frac,avg_width,n_distinct,correlation FROM
pg_stats where tablename = 'transactions_posted' AND attname in
('transaction_date','product_id');
attname | null_frac | avg_width | n_distinct | correlation
------------------+-----------+-----------+------------+-------------
product_id | 0 | 4 | 2 | 0.200956
transaction_date | 0 | 8 | -0.200791 | 0.289248
Any ideas? I'm loathe to recommend cluster, since the data will not
stay clustered.
A
--
----
Andrew Sullivan 204-4141 Yonge Street
Afilias Canada Toronto, Ontario Canada
<[EMAIL PROTECTED]> M2P 2A8
+1 416 646 3304 x110
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match