Stephan Szabo <[EMAIL PROTECTED]> writes: > Within the scope of the new hashed IN stuff I believe so in at least some > cases. I have a few million row table of integers where searching for > values IN (~10000 values) takes longer than creating the temp table, > copying into it and doing the in subquery.
I did some profiling and soon realized that the main problem is the executor's trick for not returning the same row multiple times in a multiple-indexscan plan node. The point is that given WHERE a = 1 OR b = 1 you could create a plan that first indexscans on a, then indexscans on b --- but you mustn't return any tuples in the second scan that you already returned in the first. IndexNext solves this by evaluating the prior-scan index conditions to see if they are true. Which is okay if there aren't too many of them. But when you have an N-element IN list this means you are going to do O(N^2) index expression evaluations. In the 10000-element IN-list test case, ExecQual gets called almost 50 million times :-( I'm toying with the notion of ripping out that logic and instead building an in-memory hashtable of already-returned TIDs. This could theoretically run out of memory if the multiple indexscan returns enough tuples, but I think in practice that wouldn't happen because the planner wouldn't choose an indexscan when very large numbers of tuples are being selected. Comments? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])