Tom Lane wrote:
I've been thinking about how to improve the performance of queries using "WHERE x IN (subselect)" and "WHERE x NOT IN (subselect)".In the existing implementation, the subquery result is rescanned to look for a match to x each time the WHERE clause is executed; this essentially makes it work like a nestloop join of the stupidest variety. (We do stick a Materialize node atop the subselect if it looks complicated, but that's not a big help in typical cases.) I've thought of three alternative implementations that would perform better in various scenarios. Each would be relatively simple to implement; the problem I'm having is figuring out how to get the planner to choose the best one. The alternatives are basically:
[abbreviated]
1. Add FROM item 2. Hash-based 3. Inner indexscan 4. the existing implementation
How about starting with a rule-based method to make the choice?The difficulty is that it's not clear how to choose one of these four ways, short of planning the *entire* query from scratch all four ways :-(. This seems pretty grim. Approaches #2 and #3 could be handled as local transformations of the WHERE clause, but we couldn't choose which to use very well if we don't yet know how many outer rows the WHERE clause will be executed for. Approach #1 is really planning a completely different query --- one with an extra FROM-item --- and there's not going to be all that much commonality in the computations, unless we restrict where the added FROM-item can be joined to the others, which'd more or less defeat the purpose. Anyone see a way around this difficulty?
1. If uncorrelated: use hash-based approach - ISTM this might address a large
percentage of the problem cases -- it could even handle the
"IN (list-of-scalars)" case. Could it fall back to a
tuplesort/binary-search for the too many to hash in memory case?
2. If correlated: use an inner indexscan
3. If you come up with a pattern where none of the approaches produce a
correct answer, use the existing implementation
You could always get fancier later if needed, but something along these lines would be a great start.
Joe
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])