I have a situation that can be summarized to the following: 

-- day in 20061215 format 
Create table calendar (
        day integer unique not null
        ); 

Create table customers (
        id serial unique not null, 
        name varchar, 
        address varchar, 
        ); 

Create table deliveries (
        customers_id integer not null references customers(id), 
        calendar_day integer not null references calendar(day), 
        delivered bool not null default false, 
        unique(customers_id, calendar_id)
        ); 
        
Imagine tens of thousands of customers, a few million deliveries. A query 
that's structurally similar to the following query is rather slow. It's 
taking over 30 seconds, climbing fast on reasonable HW. (SMP Opteron, 10k 
SCSI, 4 GB RAM) If I remove the outer join, performance is < 1 second. 

SELECT customers.id as customers_id, 
        customers.name AS customers_name, 
        calendar.day AS calendar_day, 
        CASE WHEN (deliveries.delivered IS NULL) THEN 'n/a' 
                WHEN (deliveries.delivered=TRUE) THEN 'yes'
                ELSE 'no' END AS delivered
        FROM customers 
        JOIN calendars ON 
                (
                -- GIVE A CALENDAR OF POSSIBLE DAYS FOR DELIVERIES 
                calendar.day < 20061201
                AND calendar.day >= 20060101
                ) 
        LEFT OUTER JOIN deliveries ON 
                ( 
                customers.id=deliveries.customers_id 
                AND deliveries.calendar_day=calendar.day 
                ) 
        ; 

What can I do to improve the performance of this oft-used query? Is there a 
better way to do this, or am I doomed to looping thru results and parsing the 
results in code? 

Thanks, 

-Ben 

-- 
"I kept looking around for somebody to solve the problem. 
Then I realized I am somebody" 
   -Anonymous

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to