>[email protected] writes:
>> the poster who started this thread had a query where the parsing phase
>> took significantly longer than the planning stage.
> That was an anecdote utterly unsupported by evidence.
regards, tom lane
The issue of prepared statements having atrocious query plans has hit me again.
I feel very strongly about this topic and the need for Postgres to have an
option that allows for prepared statements to re-plan based on the inputs that
works and is user friendly. Pardon my bluntness, but in the current situation
the system is brain dead in many important use cases and data sets.
I believe my statement referenced above was about parsing time to the remaining
time, not parsing compared to planning. But either way, its a minor detail,
and its not important to justify the need for the enhancement here.
Yeah, its anecdotal from your perspective. Go ahead and ignore all that if you
wish.
**** I am making several points in this message that are independent of such
evidence, IMHO.
I have tried to rearrange this message so that the anecdotal narrative is at
the end, after the first dashed line.
Unnamed prepared statements do solve much of the problem in theory, since the
most common issue is typically poor execution plans or a lack of ability to
cleanly deal with SQL injection and write less bug prone client code. Parsing
being very expensive is more rare. But there IS a performance savings that is
not insignificant for many workloads to be had by avoiding the utterly
avoidable parsing.
HOWEVER:
What is overlooked WRT unnamed prepared statements, is that they are hard to
use and changing client code or behavior is difficult, error prone, and
sometimes impossible. Not all client APIs play nice with them at the moment
(see Postgres' JDBC). The behavior has some global tweaks, but these are
useless in many situations where you need behavior that varies.
Every time the answer to a problem is to change the client behavior, I ask
myself if the DB could have a better default or configuration parameter so that
clients don't have to change. Some database instances have dozens of
applications and hundreds of client types including ad-hoc usage. Some client
code is legacy code that simply can't be changed.
Changing the clients is a lot harder than changing a db parameter, or
configuring a new default for a particular db user. If the client must change,
adding something like SET prepare_level = 'parse_only' is the least intrusive
and easiest to test - but I stress again that in many real-world cases the
client is not flexible.
A session-level parameter that controls prepared statement behavior defaults
(never cache by default? parse cache only? parse cache and plan cache?) would
be a blessing. A DBA could administer a fix to certain problems without having
to force clients to change behavior or wait for new client API versions with
fixes.
That reminds me, isn't there a global parameter that can force no prepared
statements to be cached, does that make them all behave as if they are unnamed?
Or are they simply re-created each time? I believe I tried this in the past
and the prepared statements were unable to use the parameter values for
partition table selection, suggesting the latter.
Typically, I run into the issue with queries in a back-end process that
operates on large data sets or partitioned tables. Prepared statements
essentially kill performance by several orders of magnitude (think, scan 1
versus scan 5000 partition tables). However, my recent issue is brutally simple.
I need to have an auto-complete web form, and thus need a query with a portion
like
WHERE name LIKE 'userenteredtext%'
Thus, I make an index with varchar_pattern_ops and off we go! ... Or not.
Works fine with explicit queries, but not a prepared query. Unfortunately,
this is highly prone to SQL injection, and the industry standard way to deal
with this is by parameterization.
http://www.owasp.org/index.php/Guide_to_SQL_Injection
(that site is a weath of information, tools, and tests on the topic, for
example: http://www.owasp.org/index.php/Testing_for_SQL_Injection).
Prepared statements are a blessing from a client code perspective, preventing
all sorts of bugs and catching many others early, on the client side. Not
being able to use them because it causes the database to execute very bad query
plans is a flaw in the database, not the client.
------------------------------------------------------------------------------------
Unnamed prepared statements did not work for me when I tried them as a solution
(8.3.2, supposedly after the fix). I was in a hurry to fix my issue, and just
moved on when they were plainly not working. It is possible I did something
wrong back then. They are also poorly supported by many client APIs -- when
they did not work for me, I supposed it was a JDBC issue or perhaps user error,
but maybe it was server side? The documentation on both sides is not entirely
clear on what should really be happening. And I could not figure out how to
debug where the problem was. Can you even test an unnamed prepared statement
in psql?
I had no time to waste and changed a lot of the client code instead, and have
since been less interested until the topic came up in this thread, and then
after letting this message sit as a draft for a month, ran into the
"varchar_pattern_ops + prepared statement = index, what index?" Issue.
--------
On queries that are overly parse-heavy:
It is very difficult to tease apart parse time from plan time in Postgres
(genrally speaking, getting performance data from Postgres is much harder than
the commercial DB's I've used). However, my experience with commercial DBs
that have running counters of time spent in various operations is that parsing
can be upwards of 50% of the total CPU usage with some workloads. Decision
Support type stuff where many light-weight queries are executed, is where I
have seen that with Oracle. In Oracle you can find out exactly how much of CPU
time was spent parsing versus planning versus other stuff in aggregate over a
time interval. I don't suspect that Postgres has a parser that is orders of
magnitude faster. But its true I don't have direct evidence to share right now
teasing apart plan from parse time.
So yes, unnamed prepared statements are potentially part (but not all) of a
solution, provided there was good control over their use in client API's (which
there isn't and won't be, since they are non-standard), or there is a useful
non-global way to configure them.
Planning is long on most queries that take long to parse, but this will vary.
(We have several queries that access tables with either only primary key
indexes or no indexes, lots of embedded 'stuff' that is naturally more parser
than planner heavy like CASE statements and value constraints/checks/modifiers
on columns without indexes and not in the where clause, and some are
approximately 2k character behemoths).
-----
On Performance Improvements gained by avoiding prepared statements:
I gained about a factor of 5x to 50x in performance by changing a lot of code
to avoid prepared statements. Some of the worst cases involved partitioned
tables, where planning issues are particularly troublesome. (lets scan 1000
tables for no good reason! Hooray!)
I never posted here about my struggles with prepared statements and execution
performance, I knew it wasn't going to change and I had to fix the problem in a
few days time. One of the queries changed had about 7 sub-selects in it, but
the eventual plan can be very fast for some cases, returning almost no data,
and we run this query repeatedly with slightly different parameters. So with
some parameters the execution time dominates by orders of magnitude, and for
most parameter combinations the execution time is almost none of it. Of
course, now we just write a new query for all the variants, else the
performance is unacceptable. This is not too difficult of a change because
these are not a SQL injection worry, although it has complicated the client
code and test cases.
---
Being able to avoid these problems and let programmers use prepared statements
would be a good thing, and so I think a solution more useable and flexible than
the current unnamed prepared statements would be great. And if the avoidable
and redundant re-parsing can be avoided too, its win-win.
Being able to cut parsing out of the loop in many cases to improve performance,
should be able to stand up on its own as a legitimate improvement. If it is
permanently bound to planning, it is permanently bound to significant caveats.
Decoupling the two and providing a means of control over these WRT prepared
statements and related features has much merit IMO.