Re: [HACKERS] [SQL] "SELECT IN" Still Broken in 7.4b

Tom Lane Thu, 21 Aug 2003 13:44:54 -0700

Stephan Szabo <[EMAIL PROTECTED]> writes:
> Within the scope of the new hashed IN stuff I believe so in at least some
> cases.  I have a few million row table of integers where searching for
> values IN (~10000 values) takes longer than creating the temp table,
> copying into it and doing the in subquery.


I did some profiling and soon realized that the main problem is the
executor's trick for not returning the same row multiple times in a
multiple-indexscan plan node.  The point is that given
        WHERE a = 1 OR b = 1
you could create a plan that first indexscans on a, then indexscans on
b --- but you mustn't return any tuples in the second scan that you
already returned in the first.  IndexNext solves this by evaluating the
prior-scan index conditions to see if they are true.  Which is okay if
there aren't too many of them.  But when you have an N-element IN list
this means you are going to do O(N^2) index expression evaluations.
In the 10000-element IN-list test case, ExecQual gets called almost
50 million times :-(

I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs.  This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.

Comments?

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] [SQL] "SELECT IN" Still Broken in 7.4b

Reply via email to