One of the few situations where I experience poor performance under PostgreSQL, compared to other, commercial databases, is when an EXISTS predicate is used. Actually, these often do perform quite well, but there are some situations where there are optimizations available which other products detect and use, which PostgreSQL misses. Looking back at the mailing list archives, it appears that optimizations similar to what other products use for both IN and EXISTS were added to the IN predicate circa 2003: http://archives.postgresql.org/pgsql-hackers/2003-08/msg00438.php -Make IN/NOT IN have similar performance to EXISTS/NOT EXISTS (Tom) There are many situations where the EXISTS predicate and an IN predicate with a table subquery produce identical results. After reviewing the SQL standard and thinking about it a bit, I think that the equivalence holds unless: (1) the equivalent IN predicate is capable of producing a result of UNKNOWN, and (2) the predicate is used in a context where the difference between UNKNOWN and FALSE is significant. One additional point: Martijn van Oosterhout pointed out in an earlier email that if the subquery contains any non-immutable functions there could be a difference, although he described that issue as "minor". The most common case for UNKNOWN logical values is when comparisons involve a NULL on either or both sides of an operator. In some cases, such as the one I posted about a month ago, it is quickly clear to a human reader that none of the above conditions exist -- the columns are all NOT NULL (so the result of the comparisons can never be UNKNOWN), they are used in a context where UNKNOWN and FALSE produce the same results, and no non-immutable functions are invoked. http://archives.postgresql.org/pgsql-hackers/2007-03/msg01408.php The plan for the EXISTS predicate (running in 8.2.3) has a cost of 72736.37, while the logically equivalent query using IN has a cost of 36.38 (and these do approximate reality), so if the faster option was visible to the planner, it would be chosen. Some would argue that I should just change the query to use IN. There are three problems with that. (1) It requires the use of a multi-value row value constructor, which is a construct not supported by all databases we currently use. (We have a heterogeneous environment, where the same queries must run on multiple platforms.) We have a somewhat ugly but valid query to use as a workaround which runs on all of our databases; it has a PostgreSQL cost of 130.98, which is tolerable, if not optimal. (2) I have seen a number of cases where the logically equivalent EXISTS predicate performs better than the IN predicate. The failure to recognize equivalence and to cost both approaches risks suboptimal performance. Avoiding that requires careful testing of both forms to coerce the planner into choosing the best plan. (3) I have been trying to move our application programmers away from a focus on how they want to navigate through the data, toward declaring what they want as the result. (Some programmers routinely use cursors to navigate each table, row by row, based on what they think is the best plan, stuffing the data into a temporary table as they go, then selecting from the temporary table, when a single SELECT statement with a few subqueries will produce the desired data.) The current situation with these predicates diverts the focus from "what to show" back to "how to get it". I hate to see any queries run slower on PostgreSQL than on other databases, so I'm suggesting we address this. We are talking about an optimization that I've seen in some other products for at least 15 years. -Kevin
---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate