> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:pgsql-hackers- > [EMAIL PROTECTED] On Behalf Of Tom Lane > Sent: Thursday, May 10, 2007 11:53 AM > To: Neil Conway > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Planning large IN lists > > Neil Conway <[EMAIL PROTECTED]> writes: > > When planning queries with a large IN expression in the WHERE clause, > > the planner transforms the IN list into a scalar array expression. In > > clause_selectivity(), we estimate the selectivity of the ScalarArrayExpr > > by calling scalararraysel(), which in turn estimates the selectivity of > > *each* array element in order to determine the selectivity of the array > > expression as a whole. > > > This is quite inefficient when the IN list is large. > > That's the least of the problems. We really ought to convert such cases > into an IN (VALUES(...)) type of query, since often repeated indexscans > aren't the best implementation.
It seems to me that if you have a unique index on the in list column, then the problem is simplified. In that case, you just have to estimate how many index seeks cost more than a table scan. Usually, it's around 5-10% of the table size for the average database. Not sure how it works out in PostgreSQL. So in the special case of an in list on a unique indexed column, compare the cardinality of the table with the number of in list items and decide to table scan or index seek based on that. For arbitrary queries, it seems that it would be necessary to keep histograms for the columns in question. Perhaps it could be collected with an advanced analyze option. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly