Re: [PERFORM] prepared query performs much worse than regular query
On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote: On Fri, 21 May 2010, Richard Yen wrote: Any ideas why the query planner chooses a different query plan when using prepared statements? This is a FAQ. Preparing a statement makes Postgres create a plan, without knowing the values that you will plug in, so it will not be as optimal as if the values were available. The whole idea is to avoid the planning cost each time the query is executed, but if your data is unusual it can result in worse plans. Two things I disagree with. 1. The whole idea is not just to avoid planning cost. It is also to easily avoid SQL injection, reduce query parse time, and to make client code cleaner and more re-usable. 2. The data does not need to be unusual. It just needs to have a skewed distribution. Skewed is not unusual (well, it would be for a primary key :P ). Maybe the planner could note a prepared query parameter is on a high skew column and build a handful of plans to choose from, or just partially re-plan on the skewed column with each execution. Or make it easier for a user to have a prepared statement that re-plans the query each time. Even just a per connection parameter SET prepared.query.cacheplan = FALSE Matthew -- Existence is a convenient concept to designate all of the files that an executable program can potentially process. -- Fortran77 standard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] prepared query performs much worse than regular query
On Tue, May 25, 2010 at 11:27:08AM -0700, Scott Carey wrote: On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote: On Fri, 21 May 2010, Richard Yen wrote: Any ideas why the query planner chooses a different query plan when using prepared statements? This is a FAQ. Preparing a statement makes Postgres create a plan, without knowing the values that you will plug in, so it will not be as optimal as if the values were available. The whole idea is to avoid the planning cost each time the query is executed, but if your data is unusual it can result in worse plans. Maybe the planner could note a prepared query parameter is on a high skew column and build a handful of plans to choose from, or just partially re-plan on the skewed column with each execution. Or make it easier for a user to have a prepared statement that re-plans the query each time. Even just a per connection parameter SET prepared.query.cacheplan = FALSE There was talk in this year's developers' meeting of doing this replanning you've suggested. (Re(?)plan parameterized plans with actual parameter values on http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting, specificall). This wouldn't show up until at least 9.1, but it's something people are thinking about. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com signature.asc Description: Digital signature
[PERFORM] prepared query performs much worse than regular query
Hi everyone, I use DBD::Pg to interface with our 8.4.2 database, but for a particular query, performance is horrible. I'm assuming that the behavior of $dbh-prepare is as if I did PREPARE foo AS (query), so I did an explain analyze in the commandline: db_alpha=# prepare foo6 as (SELECT me.id, me.assignment, me.title, me.x_firstname, me.x_lastname, me.owner, me.node, me.grade, me.folder, me.word_count, me.char_length, me.char_count, me.page_count FROM submissions me WHERE ( ( owner = $1 AND me.assignment = $2 ) )); PREPARE db_alpha=# explain analyze execute foo6('-1', '8996557'); QUERY PLAN --- Bitmap Heap Scan on submissions me (cost=38.84..42.85 rows=1 width=70) (actual time=346567.665..346567.665 rows=0 loops=1) Recheck Cond: ((assignment = $2) AND (owner = $1)) - BitmapAnd (cost=38.84..38.84 rows=1 width=0) (actual time=346567.642..346567.642 rows=0 loops=1) - Bitmap Index Scan on submissions_assignment_idx (cost=0.00..19.27 rows=177 width=0) (actual time=0.038..0.038 rows=2 loops=1) Index Cond: (assignment = $2) - Bitmap Index Scan on submissions_owner_idx (cost=0.00..19.32 rows=184 width=0) (actual time=346566.501..346566.501 rows=28977245 loops=1) Index Cond: (owner = $1) Total runtime: 346567.757 ms (8 rows) Now, if I run it without preparing it--just run it directly in the commandline--I get this plan: db_alpha=# explain analyze SELECT me.id, me.assignment, me.title, me.x_firstname, me.x_lastname, me.owner, me.node, me.grade, me.folder, me.word_count, me.char_length, me.char_count, me.page_count FROM submissions me WHERE ( ( owner = -1 AND me.assignment = 8996557 ) ) db_alpha-# ; QUERY PLAN - Index Scan using submissions_assignment_idx on submissions me (cost=0.00..549.15 rows=36 width=70) (actual time=0.021..0.021 rows=0 loops=1) Index Cond: (assignment = 8996557) Filter: (owner = (-1)) Total runtime: 0.042 ms (4 rows) submissions has ~124 million rows, and owner -1 is a placeholder in my database, to fulfill a foreign key requirement. I tried REINDEXing submissions_owner_idx and performing a VACUUM ANALYZE on the submissions table, but nothing seems to make a difference for this query. One other thing to note is that if I use any other value for the owner column, it comes back really fast ( 0.04 ms). Any ideas why the query planner chooses a different query plan when using prepared statements? --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] prepared query performs much worse than regular query
On Fri, May 21, 2010 at 4:53 PM, Richard Yen d...@richyen.com wrote: Any ideas why the query planner chooses a different query plan when using prepared statements? A prepared plan is the best one the planner can come up with *in general* for the query in question. If the distribution of the values you're querying against -- in your case, owner and assignment -- aren't relatively uniform, that plan is going to be suboptimal, if not downright pathological, for the more outlying-ly distributed values. Looking at your prepared plan, it seems that, on average, there are 177 rows for every assignment, and 184 per owner. As it turns out, though, nearly a quarter of your table has an owner of -1. It's not terribly surprising, with a table that big and a distribution skew of that magnitude, that this query plan, with these arguments, ends up pretty firmly in the pathological category. rls -- :wq -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] prepared query performs much worse than regular query
On Fri, 21 May 2010, Richard Yen wrote: Any ideas why the query planner chooses a different query plan when using prepared statements? This is a FAQ. Preparing a statement makes Postgres create a plan, without knowing the values that you will plug in, so it will not be as optimal as if the values were available. The whole idea is to avoid the planning cost each time the query is executed, but if your data is unusual it can result in worse plans. Matthew -- Existence is a convenient concept to designate all of the files that an executable program can potentially process. -- Fortran77 standard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance