Re: [PERFORM] Poor plan choice in prepared statement

Scott Carey Fri, 30 Jan 2009 11:42:40 -0800

>[email protected] writes:
>> the poster who started this thread had a query where the parsing phase
>> took significantly longer than the planning stage.


> That was an anecdote utterly unsupported by evidence.
                        regards, tom lane

The issue of prepared statements having atrocious query plans has hit me again. 
 I feel very strongly about this topic and the need for Postgres to have an 
option that allows for prepared statements to re-plan based on the inputs that 
works and is user friendly.  Pardon my bluntness, but in the current situation 
the system is brain dead in many important use cases and data sets.

I believe my statement referenced above was about parsing time to the remaining 
time, not parsing compared to planning.  But either way, its a minor detail, 
and its not important to justify the need for the enhancement here.
Yeah, its anecdotal from your perspective.  Go ahead and ignore all that if you 
wish.
**** I am making several points in this message that are independent of such 
evidence, IMHO.
I have tried to rearrange this message so that the anecdotal narrative is at 
the end, after the first dashed line.


Unnamed prepared statements do solve much of the problem in theory, since the 
most common issue is typically poor execution plans or a lack of ability to 
cleanly deal with SQL injection and write less bug prone client code.  Parsing 
being very expensive is more rare.  But there IS a performance savings that is 
not insignificant for many workloads to be had by avoiding the utterly 
avoidable parsing.

HOWEVER:
What is overlooked WRT unnamed prepared statements, is that they are hard to 
use and changing client code or behavior is difficult, error prone, and 
sometimes impossible.  Not all client APIs play nice with them at the moment 
(see Postgres' JDBC).  The behavior has some global tweaks, but these are 
useless in many situations where you need behavior that varies.

Every time the answer to a problem is to change the client behavior, I ask 
myself if the DB could have a better default or configuration parameter so that 
clients don't have to change.  Some database instances have dozens of 
applications and hundreds of client types including ad-hoc usage.  Some client 
code is legacy code that simply can't be changed.
Changing the clients is a lot harder than changing a db parameter, or 
configuring a new default for a particular db user.  If the client must change, 
adding something like SET prepare_level = 'parse_only' is the least intrusive 
and easiest to test - but I stress again that in many real-world cases the 
client is not flexible.

A session-level parameter that controls prepared statement behavior defaults 
(never cache by default?  parse cache only? parse cache and plan cache?) would 
be a blessing.  A DBA could administer a fix to certain problems without having 
to force clients to change behavior or wait for new client API versions with 
fixes.

That reminds me, isn't there a global parameter that can force no prepared 
statements to be cached, does that make them all behave as if they are unnamed? 
 Or are they simply re-created each time?  I believe I tried this in the past 
and the prepared statements were unable to use the parameter values for 
partition table selection, suggesting the latter.

Typically, I run into the issue with queries in a back-end process that 
operates on large data sets or partitioned tables.  Prepared statements 
essentially kill performance by several orders of magnitude (think, scan 1 
versus scan 5000 partition tables). However, my recent issue is brutally simple.
I need to have an auto-complete web form, and thus need a query with a portion 
like
WHERE name LIKE 'userenteredtext%'
Thus, I make an index with varchar_pattern_ops and off we go! ... Or not.  
Works fine with explicit queries, but not a prepared query.  Unfortunately, 
this is highly prone to SQL injection, and the industry standard way to deal 
with this is by parameterization.
http://www.owasp.org/index.php/Guide_to_SQL_Injection
(that site is a weath of information, tools, and tests on the topic, for 
example: http://www.owasp.org/index.php/Testing_for_SQL_Injection).

Prepared statements are a blessing from a client code perspective, preventing 
all sorts of bugs and catching many others early, on the client side.  Not 
being able to use them because it causes the database to execute very bad query 
plans is a flaw in the database, not the client.

------------------------------------------------------------------------------------
Unnamed prepared statements did not work for me when I tried them as a solution 
(8.3.2, supposedly after the fix).  I was in a hurry to fix my issue, and just 
moved on when they were plainly not working.  It is possible I did something 
wrong back then.  They are also poorly supported by many client APIs -- when 
they did not work for me, I supposed it was a JDBC issue or perhaps user error, 
but maybe it was server side?  The documentation on both sides is not entirely 
clear on what should really be happening.  And I could not figure out how to 
debug where the problem was.  Can you even test an unnamed prepared statement 
in psql?
I had no time to waste and changed a lot of the client code instead, and have 
since been less interested until the topic came up in this thread, and then 
after letting this message sit as a draft for a month, ran into the 
"varchar_pattern_ops + prepared statement = index, what index?" Issue.

--------
On queries that are overly parse-heavy:

It is very difficult to tease apart parse time from plan time in Postgres 
(genrally speaking, getting performance data from Postgres is much harder than 
the commercial DB's I've used).  However, my experience with commercial DBs 
that have running counters of time spent in various operations is that parsing 
can be upwards of 50% of the total CPU usage with some workloads.  Decision 
Support type stuff where many light-weight queries are executed, is where I 
have seen that with Oracle.  In Oracle you can find out exactly how much of CPU 
time was spent parsing versus planning versus other stuff in aggregate over a 
time interval.  I don't suspect that Postgres has a parser that is orders of 
magnitude faster.  But its true I don't have direct evidence to share right now 
teasing apart plan from parse time.

So yes, unnamed prepared statements are potentially part (but not all) of a 
solution, provided there was good control over their use in client API's (which 
there isn't and won't be, since they are non-standard), or there is a useful 
non-global way to configure them.

Planning is long on most queries that take long to parse, but this will vary.  
(We have several queries that access tables with either only primary key 
indexes or no indexes, lots of embedded 'stuff' that is naturally more parser 
than planner heavy like CASE statements and value constraints/checks/modifiers 
on columns without indexes and not in the where clause, and some are 
approximately 2k character behemoths).

-----
On Performance Improvements gained by avoiding prepared statements:

I gained about a factor of 5x to 50x in performance by changing a lot of code 
to avoid prepared statements.  Some of the worst cases involved partitioned 
tables, where planning issues are particularly troublesome.  (lets scan 1000 
tables for no good reason! Hooray!)

I never posted here about my struggles with prepared statements and execution 
performance, I knew it wasn't going to change and I had to fix the problem in a 
few days time.  One of the queries changed had about 7 sub-selects in it, but 
the eventual plan can be very fast for some cases, returning almost no data, 
and we run this query repeatedly with slightly different parameters.  So with 
some parameters the execution time dominates by orders of magnitude, and for 
most parameter combinations the execution time is almost none of it.  Of 
course, now we just write a new query for all the variants, else the 
performance is unacceptable.   This is not too difficult of a change because 
these are not a SQL injection worry, although it has complicated the client 
code and test cases.

---
Being able to avoid these problems and let programmers use prepared statements 
would be a good thing, and so I think a solution more useable and flexible than 
the current unnamed prepared statements would be great.  And if the avoidable 
and redundant re-parsing can be avoided too, its win-win.

Being able to cut parsing out of the loop in many cases to improve performance, 
should be able to stand up on its own as a legitimate improvement.  If it is 
permanently bound to planning, it is permanently bound to significant caveats.  
Decoupling the two and providing a means of control over these WRT prepared 
statements and related features has much merit IMO.

Re: [PERFORM] Poor plan choice in prepared statement

Reply via email to