Josh Berkus <[EMAIL PROTECTED]> writes:
> Thus, I need to select:

> SELECT Data FROM Table A 
> WHERE CaseID NOT IN (
>       SELECT CaseID FROM Table_B, Table_C
>       WHERE Table_B.GroupID = TableC.GroupID
>               AND TableC.AccountID = 11)

> The problem is, since Table_B and Table_C are large (10,000 records +)
> this exclusion query takes several *minutes* to run.

I don't think there is any good way to make this fast in current
sources.  A partial workaround is to use a temp table:

SELECT CaseID INTO TEMP TABLE mycaseids FROM Table_B, Table_C
        WHERE Table_B.GroupID = TableC.GroupID
                AND TableC.AccountID = 11;

CREATE INDEX mycaseids_idx ON mycaseids(caseid);  -- critical!

SELECT Data FROM TableA upper
WHERE NOT EXISTS (select 1 from mycaseids where caseid = upper.caseid);

You'd need to check with EXPLAIN, but the EXISTS subplan should make
use of the index to probe the temp table, so you get one index lookup
per outer tuple.  Better than a complete scan of the subselect outputs,
which is what you'll get with the NOT IN style.


In 7.1 it'll be possible to do this with an outer join, which should
be a lot quicker:

SELECT Data FROM TableA LEFT JOIN
(SELECT CaseID FROM Table_B, Table_C
 WHERE Table_B.GroupID = TableC.GroupID
        AND TableC.AccountID = 11) subselect
ON (tablea.caseid = subselect.caseid)
WHERE subselect.caseid IS NULL;

ie, do the outer join and then discard the successfully-matched rows.


Further down the pike, we have plans to make the system smart enough to
transform IN and NOT IN constructs into join-like queries automatically.
Right now, though, they're best rewritten into something else when
performance is important.

                        regards, tom lane

Reply via email to