Tom Lane wrote:
I've been thinking about how to improve the performance of queries using
"WHERE x IN (subselect)" and "WHERE x NOT IN (subselect)".

In the existing implementation, the subquery result is rescanned to look
for a match to x each time the WHERE clause is executed; this essentially
makes it work like a nestloop join of the stupidest variety.  (We do
stick a Materialize node atop the subselect if it looks complicated, but
that's not a big help in typical cases.)

I've thought of three alternative implementations that would perform
better in various scenarios.  Each would be relatively simple to
implement; the problem I'm having is figuring out how to get the planner
to choose the best one.  The alternatives are basically:

[abbreviated]
1. Add FROM item
2. Hash-based
3. Inner indexscan
4. the existing implementation

The difficulty is that it's not clear how to choose one of these four
ways, short of planning the *entire* query from scratch all four ways :-(.
This seems pretty grim.  Approaches #2 and #3 could be handled as local
transformations of the WHERE clause, but we couldn't choose which to use
very well if we don't yet know how many outer rows the WHERE clause will
be executed for.  Approach #1 is really planning a completely different
query --- one with an extra FROM-item --- and there's not going to be
all that much commonality in the computations, unless we restrict where
the added FROM-item can be joined to the others, which'd more or less
defeat the purpose.

Anyone see a way around this difficulty?
How about starting with a rule-based method to make the choice?

1. If uncorrelated: use hash-based approach - ISTM this might address a large
percentage of the problem cases -- it could even handle the
"IN (list-of-scalars)" case. Could it fall back to a
tuplesort/binary-search for the too many to hash in memory case?
2. If correlated: use an inner indexscan
3. If you come up with a pattern where none of the approaches produce a
correct answer, use the existing implementation

You could always get fancier later if needed, but something along these lines would be a great start.

Joe


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to