I have a performance problem using a dimensional model where the date is
specified in a DATE dimension, specifically when using 'WHERE DATE >= 'Some
Date'

This query runs very fast when using an equality expression, eg. 'WHERE
DATE = '2014-01-01", and I'm wondering if there is a way to make it run
fast when using the greater than expression.

The dimension table is about 5k rows, and the Fact table is ~60M.

Thanks in advance for any advice.

JT.



The query :

select sid, count(*) from fact fact_data fact left outer join dim_date dim
on dim.date_id = fact.date_id where dim.date >= '2014-1-25' group by sid
order by count desc limit 10;

FACT Table Definition:

           Table "public.fact_data"
    Column     |            Type             | Modifiers
---------------+-----------------------------+-----------
 date_id | integer                     |
 date    | timestamp without time zone |
 agent_id      | integer                     |
 instance_id   | integer                     |
 sid           | integer                     |
 Indexes:
    "fact_agent_id" btree (agent_id)
    "fact_date_id" btree (date_id) CLUSTER
    "fact_alarms_sid" btree (sid)


                                 Table "public.dim_date"
       Column       |  Type   |                         Modifiers

--------------------+---------+------------------------------------------------------------
 date_id            | integer | not null default
nextval('dim_date_date_id_seq'::regclass)
 date               | date    |
 year               | integer |
 month              | integer |
 month_name         | text    |
 day                | integer |
 day_of_year        | integer |
 weekday_name       | text    |
 calendar_week      | integer |
 quarter            | text    |
 year_quarter       | text    |
 year_month         | text    |
 year_calendar_week | text    |
 weekend            | text    |
 week_start_date    | date    |
 week_end_date      | date    |
 month_start_date   | date    |
 month_end_date     | date    |
Indexes:
    "dim_date_date" btree (date)
    "dim_date_date_id" btree (date_id)

EXPLAIN Output:

explain (analyze, buffers)  select dim.date_id, fact.sid, count(1) from
fact_data fact left outer join dim_date dim on dim.date_id = fact.date_id
where dim.date_id >= 5139 group by 1,2  order by 3 desc limit 10;

   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9772000.55..9772000.58 rows=10 width=8) (actual
time=91064.421..91064.440 rows=10 loops=1)
   Buffers: shared hit=4042 read=1542501
   ->  Sort  (cost=9772000.55..9787454.06 rows=6181404 width=8) (actual
time=91064.408..91064.414 rows=10 loops=1)
         Sort Key: (count(1))
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=4042 read=1542501
         ->  GroupAggregate  (cost=9150031.23..9638422.63 rows=6181404
width=8) (actual time=90892.625..91063.905 rows=617 loops=1)
               Buffers: shared hit=4042 read=1542501
               ->  Sort  (cost=9150031.23..9256675.57 rows=42657736
width=8) (actual time=90877.129..90964.995 rows=124965 loops=1)
                     Sort Key: dim.date_id, fact.sid
                     Sort Method: quicksort  Memory: 8930kB
                     Buffers: shared hit=4042 read=1542501
                     ->  Hash Join  (cost=682.34..3160739.50 rows=42657736
width=8) (actual time=45087.394..90761.624 rows=124965 loops=1)
                           Hash Cond: (fact.date_id = dim.date_id)
                           Buffers: shared hit=4042 read=1542501
                           ->  Seq Scan on fact_data fact
 (cost=0.00..2139866.40 rows=59361340 width=8) (actual
time=0.090..47001.500 rows=59360952 loops=1)
                                 Buffers: shared hit=3752 read=1542501
                           ->  Hash  (cost=518.29..518.29 rows=13124
width=4) (actual time=21.083..21.083 rows=13125 loops=1)
                                 Buckets: 2048  Batches: 1  Memory Usage:
462kB
                                 Buffers: shared hit=290
                                 ->  Seq Scan on dim_date dim
 (cost=0.00..518.29 rows=13124 width=4) (actual time=0.494..10.918
rows=13125 loops=1)
                                       Filter: (date_id >= 5139)
                                       Rows Removed by Filter: 5138
                                       Buffers: shared hit=290
 Total runtime: 91064.496 ms
(25 rows)

Reply via email to