I have a performance problem using a dimensional model where the date is specified in a DATE dimension, specifically when using 'WHERE DATE >= 'Some Date'
This query runs very fast when using an equality expression, eg. 'WHERE DATE = '2014-01-01", and I'm wondering if there is a way to make it run fast when using the greater than expression. The dimension table is about 5k rows, and the Fact table is ~60M. Thanks in advance for any advice. JT. The query : select sid, count(*) from fact fact_data fact left outer join dim_date dim on dim.date_id = fact.date_id where dim.date >= '2014-1-25' group by sid order by count desc limit 10; FACT Table Definition: Table "public.fact_data" Column | Type | Modifiers ---------------+-----------------------------+----------- date_id | integer | date | timestamp without time zone | agent_id | integer | instance_id | integer | sid | integer | Indexes: "fact_agent_id" btree (agent_id) "fact_date_id" btree (date_id) CLUSTER "fact_alarms_sid" btree (sid) Table "public.dim_date" Column | Type | Modifiers --------------------+---------+------------------------------------------------------------ date_id | integer | not null default nextval('dim_date_date_id_seq'::regclass) date | date | year | integer | month | integer | month_name | text | day | integer | day_of_year | integer | weekday_name | text | calendar_week | integer | quarter | text | year_quarter | text | year_month | text | year_calendar_week | text | weekend | text | week_start_date | date | week_end_date | date | month_start_date | date | month_end_date | date | Indexes: "dim_date_date" btree (date) "dim_date_date_id" btree (date_id) EXPLAIN Output: explain (analyze, buffers) select dim.date_id, fact.sid, count(1) from fact_data fact left outer join dim_date dim on dim.date_id = fact.date_id where dim.date_id >= 5139 group by 1,2 order by 3 desc limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=9772000.55..9772000.58 rows=10 width=8) (actual time=91064.421..91064.440 rows=10 loops=1) Buffers: shared hit=4042 read=1542501 -> Sort (cost=9772000.55..9787454.06 rows=6181404 width=8) (actual time=91064.408..91064.414 rows=10 loops=1) Sort Key: (count(1)) Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=4042 read=1542501 -> GroupAggregate (cost=9150031.23..9638422.63 rows=6181404 width=8) (actual time=90892.625..91063.905 rows=617 loops=1) Buffers: shared hit=4042 read=1542501 -> Sort (cost=9150031.23..9256675.57 rows=42657736 width=8) (actual time=90877.129..90964.995 rows=124965 loops=1) Sort Key: dim.date_id, fact.sid Sort Method: quicksort Memory: 8930kB Buffers: shared hit=4042 read=1542501 -> Hash Join (cost=682.34..3160739.50 rows=42657736 width=8) (actual time=45087.394..90761.624 rows=124965 loops=1) Hash Cond: (fact.date_id = dim.date_id) Buffers: shared hit=4042 read=1542501 -> Seq Scan on fact_data fact (cost=0.00..2139866.40 rows=59361340 width=8) (actual time=0.090..47001.500 rows=59360952 loops=1) Buffers: shared hit=3752 read=1542501 -> Hash (cost=518.29..518.29 rows=13124 width=4) (actual time=21.083..21.083 rows=13125 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 462kB Buffers: shared hit=290 -> Seq Scan on dim_date dim (cost=0.00..518.29 rows=13124 width=4) (actual time=0.494..10.918 rows=13125 loops=1) Filter: (date_id >= 5139) Rows Removed by Filter: 5138 Buffers: shared hit=290 Total runtime: 91064.496 ms (25 rows)