>>> Tom Lane <[EMAIL PROTECTED]> wrote: > The examples that Kevin Grittner put up awhile back included several > uses of EXISTS() in places where it couldn't be turned into a semijoin, > eg in the query's targetlist. I was musing a bit about whether we could > improve those scenarios. I would like to get 8.4 to the point where we > could say as a blanket performance recommendation "prefer EXISTS over > IN". The semantic gotchas associated with NOT IN make it hard to > optimize well, not to mention being a perennial bane of novices; so if > we could just point people in the other direction without qualification > I think we'd be better off. Agreed. > So ... I'm wondering if this actually touches anyone's hot-button, > or if we should just file it in the overflowing pile of Things That > Might Be Nice To Do Someday. > > Comments? I'm in the position of trying to influence programmers here to write queries using set logic. Way too many of the queries here are coded with a cursor for a "primary" select, with a bunch of lower level cursors to navigate around and get the related rows one at a time. Results are often stuck into a work table as this progresses, with the work table massaged a bit here and there in this procedural process, and the final results selected out. It should surprise nobody here that this is not fast to write, easy to maintain, efficient to run, or generally free from subtle errors. I point out that they should write queries which state what they want, regardless of how complex those rules are, instead of writing how to get it. The optimizer, I argue, has tricks available which they don't. Usually, a rewrite into set logic has a fraction of the number of lines, runs much faster, and loses a bug or two that was hidden within the procedural spaghetti. On the other hand, sometimes they write a perfectly good "set logic" query (from the point of view of stating what they want), and the optimizer falls down, and I have to come in and say "Oh, it has trouble with EXISTS; you can use IN here." When I tell them to use IN instead of EXISTS, then I need to have all these caveats about the sizes of tables and the possibilities of NULL on the NOT EXISTS. At this point I tend to lose a big part of my audience. So I'd be very happy to see this work done, not because I can't find a workaround, but because trying to teach all the programmers tricky hand-optimizations is a losing battle, and if I lose that battle the queries degenerate into spaghetti-land. As with others, I can't say where this fits on a priority list, but I would hate to see it drift off onto a "someday" list. -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers