Re: [PERFORM] hash semi join caused by IN (select ...)
Hi, Does anybody know why the planner treats = ANY(ARRAY(select ...)) differently than IN(select ...)? Which one is preferable, when I already have a lot of joins? Thanks, Clemens 2011/5/17 Clemens Eisserer linuxhi...@gmail.com: Hi, select from t1 left join t2 WHERE id IN (select ) Does it work as expected with one less join? If so, try increasing join_collapse_limit ... That did the trick - thanks a lot. I only had to increase join_collapse_limit a bit and now get an almost perfect plan. Instead of hash-joining all the data, the planner generates nested-loop-joins with index only on the few rows I fetch. Using = ANY(array(select... )) also seems to work, I wonder which one works better. Does ANY(ARRAY(...)) force the optimizer to plan the subquery seperated from the main query? Thanks, Clemens -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hash semi join caused by IN (select ...)
On Wed, May 18, 2011 at 1:46 AM, Clemens Eisserer linuxhi...@gmail.comwrote: Hi, Does anybody know why the planner treats = ANY(ARRAY(select ...)) differently than IN(select ...)? Which one is preferable, when I already have a lot of joins? Thanks, Clemens 2011/5/17 Clemens Eisserer linuxhi...@gmail.com: Hi, select from t1 left join t2 WHERE id IN (select ) Does it work as expected with one less join? If so, try increasing join_collapse_limit ... That did the trick - thanks a lot. I only had to increase join_collapse_limit a bit and now get an almost perfect plan. Instead of hash-joining all the data, the planner generates nested-loop-joins with index only on the few rows I fetch. Using = ANY(array(select... )) also seems to work, I wonder which one works better. Does ANY(ARRAY(...)) force the optimizer to plan the subquery seperated from the main query? Thanks, Clemens -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance I'm just a user so I don't have definitive knowledge of this, but my experience seems to indicate that the = ANY(ARRAY(SELECT ...)) does the select and turns it into an array and then uses that in the where clause in a manner similar to a hard coded list of values, like IN (1, 2, 3, ...). In theory, the planner could do the same sort of things with the IN (SELECT ...) but my experience seems to indicate that in some cases it decides not to use an index that it could. One specific example I know of is that at least in PostgreSQL 8.3, a view with a UNION/UNION ALL will push the = ANY(ARRAY(SELECT ...)) down into the two sub-queries, but the IN (SELECT ...) will be applied after the UNION ALL. Dave
[PERFORM] LIMIT and UNION ALL
I am using Postgres 8.3.3 and I have a VIEW which is a UNION ALL of two tables but when I do a select on the view using a LIMIT, it scans the entire tables and takes significantly longer than writing out the query with the LIMITs in the sub-queries themselves. Is there a solution to get the view to perform like the query with the LIMIT explicitly placed in the sub-queries? I noticed a similar question in this post http://archives.postgresql.org/pgsql-general/2005-06/msg00488.php but I wasn't able to find an answer. Thanks, Dave
Re: [PERFORM] LIMIT and UNION ALL
On Wed, May 18, 2011 at 5:26 PM, Dave Johansen davejohan...@gmail.com wrote: I am using Postgres 8.3.3 and I have a VIEW which is a UNION ALL of two tables but when I do a select on the view using a LIMIT, it scans the entire tables and takes significantly longer than writing out the query with the LIMITs in the sub-queries themselves. Is there a solution to get the view to perform like the query with the LIMIT explicitly placed in the sub-queries? Can you show DDL and queries? The query with the LIMIT on the subqueries and the one with the LIMIT on the overall query are not semantically equivalent. Since you can have an ORDER BY before the LIMIT on the query with the limit on the view the database must have all the rows before it can apply the ordering and properly determine the limit. Although it might be possible to determine under particular circumstances that only one of the tables needs to be queried or tables need only be queried partially I deem that quite complex. I do not know whether Postgres can do such optimizations but for that we would certainly need to see the concrete example (including constraint and indexes). Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] LIMIT and UNION ALL
Hello 2011/5/18 Dave Johansen davejohan...@gmail.com: I am using Postgres 8.3.3 and I have a VIEW which is a UNION ALL of two tables but when I do a select on the view using a LIMIT, it scans the entire tables and takes significantly longer than writing out the query with the LIMITs in the sub-queries themselves. Is there a solution to get the view to perform like the query with the LIMIT explicitly placed in the sub-queries? I noticed a similar question in this post http://archives.postgresql.org/pgsql-general/2005-06/msg00488.php but I wasn't able to find an answer. maybe SELECT * FROM (SELECT * FROM tab1 LIMIT n) s1 UNION ALL SELECT * FROM (SELECT * FROM tab2 LIMIT n) s2 LIMIT n Regards Pavel Stehule Thanks, Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hash semi join caused by IN (select ...)
On 5/17/11 12:38 AM, Clemens Eisserer linuxhi...@gmail.com wrote: Hi, select from t1 left join t2 WHERE id IN (select ) Does it work as expected with one less join? If so, try increasing join_collapse_limit ... That did the trick - thanks a lot. I only had to increase join_collapse_limit a bit and now get an almost perfect plan. Instead of hash-joining all the data, the planner generates nested-loop-joins with index only on the few rows I fetch. Using = ANY(array(select... )) also seems to work, I wonder which one works better. Does ANY(ARRAY(...)) force the optimizer to plan the subquery seperated from the main query? I'm not sure exactly what happens with ANY(ARRAY()). I am fairly confident that the planner simply transforms an IN(select ...) to a join, since they are equivalent. Because foo IN (select ...) is just a join, it counts towards join_collapse_limit. Thanks, Clemens -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Jim Nasby wrote: I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X serial values? Something like correlation? Hmm... it would be interesting if we had average relation access times for each stats bucket on a per-column basis; that would give the planner a better idea of how much IO overhead there would be for a given WHERE clause You've already given one reasonable first answer to your question here. If you defined a usage counter for each histogram bucket, and incremented that each time something from it was touched, that could lead to a very rough way to determine access distribution. Compute a ratio of the counts in those buckets, then have an estimate of the total cached percentage; multiplying the two will give you an idea how much of that specific bucket might be in memory. It's not perfect, and you need to incorporate some sort of aging method to it (probably weighted average based), but the basic idea could work. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance