Re: [HACKERS] Planning large IN lists

Dann Corbit Thu, 10 May 2007 12:54:41 -0700

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
> [EMAIL PROTECTED] On Behalf Of Tom Lane
> Sent: Thursday, May 10, 2007 11:53 AM
> To: Neil Conway
> Cc: [email protected]
> Subject: Re: [HACKERS] Planning large IN lists
> 
> Neil Conway <[EMAIL PROTECTED]> writes:
> > When planning queries with a large IN expression in the WHERE
clause,
> > the planner transforms the IN list into a scalar array expression.
In
> > clause_selectivity(), we estimate the selectivity of the
ScalarArrayExpr
> > by calling scalararraysel(), which in turn estimates the selectivity
of
> > *each* array element in order to determine the selectivity of the
array
> > expression as a whole.
> 
> > This is quite inefficient when the IN list is large.
> 
> That's the least of the problems.  We really ought to convert such
cases
> into an IN (VALUES(...)) type of query, since often repeated
indexscans
> aren't the best implementation.


It seems to me that if you have a unique index on the in list column,
then the problem is simplified.  In that case, you just have to estimate
how many index seeks cost more than a table scan.  Usually, it's around
5-10% of the table size for the average database.  Not sure how it works
out in PostgreSQL.

So in the special case of an in list on a unique indexed column, compare
the cardinality of the table with the number of in list items and decide
to table scan or index seek based on that.

For arbitrary queries, it seems that it would be necessary to keep
histograms for the columns in question.  Perhaps it could be collected
with an advanced analyze option.


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [HACKERS] Planning large IN lists

Reply via email to