Re: [PERFORM] prepared query performs much worse than regular query

2010-05-25 Thread Scott Carey

On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote:

 On Fri, 21 May 2010, Richard Yen wrote:
 Any ideas why the query planner chooses a different query plan when using 
 prepared statements?
 
 This is a FAQ. Preparing a statement makes Postgres create a plan, without 
 knowing the values that you will plug in, so it will not be as optimal as 
 if the values were available. The whole idea is to avoid the planning cost 
 each time the query is executed, but if your data is unusual it can 
 result in worse plans.
 

Two things I disagree with.  
1. The whole idea is not just to avoid planning cost.  It is also to easily 
avoid SQL injection, reduce query parse time, and to make client code cleaner 
and more re-usable.
2. The data does not need to be unusual.  It just needs to have a skewed 
distribution.  Skewed is not unusual (well, it would be for a primary key :P ).

Maybe the planner could note a prepared query parameter is on a high skew 
column and build a handful of plans to choose from, or just partially re-plan 
on the skewed column with each execution.  
Or make it easier for a user to have a prepared statement that re-plans the 
query each time.  Even just a per connection parameter SET 
prepared.query.cacheplan = FALSE

 Matthew
 
 -- 
 Existence is a convenient concept to designate all of the files that an
 executable program can potentially process.   -- Fortran77 standard
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] prepared query performs much worse than regular query

2010-05-25 Thread Joshua Tolley
On Tue, May 25, 2010 at 11:27:08AM -0700, Scott Carey wrote:
 On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote:
  On Fri, 21 May 2010, Richard Yen wrote:
  Any ideas why the query planner chooses a different query plan when using 
  prepared statements?
  
  This is a FAQ. Preparing a statement makes Postgres create a plan, without 
  knowing the values that you will plug in, so it will not be as optimal as 
  if the values were available. The whole idea is to avoid the planning cost 
  each time the query is executed, but if your data is unusual it can 
  result in worse plans.
  
 Maybe the planner could note a prepared query parameter is on a high skew
 column and build a handful of plans to choose from, or just partially
 re-plan on the skewed column with each execution.  Or make it easier for a
 user to have a prepared statement that re-plans the query each time.  Even
 just a per connection parameter SET prepared.query.cacheplan = FALSE

There was talk in this year's developers' meeting of doing this replanning
you've suggested. (Re(?)plan parameterized plans with actual parameter
values on http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting,
specificall). This wouldn't show up until at least 9.1, but it's something
people are thinking about.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


[PERFORM] prepared query performs much worse than regular query

2010-05-21 Thread Richard Yen
Hi everyone,

I use DBD::Pg to interface with our 8.4.2 database, but for a particular query, 
performance is horrible.  I'm assuming that the behavior of $dbh-prepare is as 
if I did PREPARE foo AS (query), so I did an explain analyze in the commandline:
 db_alpha=# prepare foo6 as (SELECT me.id, me.assignment, me.title, 
 me.x_firstname, me.x_lastname, me.owner, me.node, me.grade, me.folder, 
 me.word_count, me.char_length, me.char_count, me.page_count FROM submissions 
 me WHERE ( ( owner = $1 AND me.assignment = $2 ) ));
 PREPARE
 db_alpha=# explain analyze execute foo6('-1', '8996557');
 QUERY 
 PLAN 
 ---
  Bitmap Heap Scan on submissions me  (cost=38.84..42.85 rows=1 width=70) 
 (actual time=346567.665..346567.665 rows=0 loops=1)
Recheck Cond: ((assignment = $2) AND (owner = $1))
-  BitmapAnd  (cost=38.84..38.84 rows=1 width=0) (actual 
 time=346567.642..346567.642 rows=0 loops=1)
  -  Bitmap Index Scan on submissions_assignment_idx  
 (cost=0.00..19.27 rows=177 width=0) (actual time=0.038..0.038 rows=2 loops=1)
Index Cond: (assignment = $2)
  -  Bitmap Index Scan on submissions_owner_idx  (cost=0.00..19.32 
 rows=184 width=0) (actual time=346566.501..346566.501 rows=28977245 loops=1)
Index Cond: (owner = $1)
  Total runtime: 346567.757 ms
 (8 rows)


Now, if I run it without preparing it--just run it directly in the 
commandline--I get this plan:
 db_alpha=# explain analyze SELECT me.id, me.assignment, me.title, 
 me.x_firstname, me.x_lastname, me.owner, me.node, me.grade, me.folder, 
 me.word_count, me.char_length, me.char_count, me.page_count FROM submissions 
 me WHERE ( ( owner = -1 AND me.assignment = 8996557 ) )
 db_alpha-# ;
  QUERY 
 PLAN  
 -
  Index Scan using submissions_assignment_idx on submissions me  
 (cost=0.00..549.15 rows=36 width=70) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: (assignment = 8996557)
Filter: (owner = (-1))
  Total runtime: 0.042 ms
 (4 rows)

submissions has ~124 million rows, and owner -1 is a placeholder in my 
database, to fulfill a foreign key requirement.  I tried REINDEXing 
submissions_owner_idx and performing a VACUUM ANALYZE on the submissions table, 
but nothing seems to make a difference for this query.  One other thing to note 
is that if I use any other value for the owner column, it comes back really 
fast ( 0.04 ms).

Any ideas why the query planner chooses a different query plan when using 
prepared statements?

--Richard
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] prepared query performs much worse than regular query

2010-05-21 Thread Rosser Schwarz
On Fri, May 21, 2010 at 4:53 PM, Richard Yen d...@richyen.com wrote:
 Any ideas why the query planner chooses a different query plan when using 
 prepared statements?

A prepared plan is the best one the planner can come up with *in
general* for the query in question.  If the distribution of the values
you're querying against -- in your case, owner and assignment --
aren't relatively uniform, that plan is going to be suboptimal, if not
downright pathological, for the more outlying-ly distributed values.

Looking at your prepared plan, it seems that, on average, there are
177 rows for every assignment, and 184 per owner.  As it turns
out, though, nearly a quarter of your table has an owner of -1.
It's not terribly surprising, with a table that big and a distribution
skew of that magnitude, that this query plan, with these arguments,
ends up pretty firmly in the pathological category.

rls

-- 
:wq

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] prepared query performs much worse than regular query

2010-05-21 Thread Matthew Wakeling

On Fri, 21 May 2010, Richard Yen wrote:

Any ideas why the query planner chooses a different query plan when using 
prepared statements?


This is a FAQ. Preparing a statement makes Postgres create a plan, without 
knowing the values that you will plug in, so it will not be as optimal as 
if the values were available. The whole idea is to avoid the planning cost 
each time the query is executed, but if your data is unusual it can 
result in worse plans.


Matthew

--
Existence is a convenient concept to designate all of the files that an
executable program can potentially process.   -- Fortran77 standard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance