Re: [HACKERS] Avoiding bad prepared-statement plans.

Mark Mielke Fri, 26 Feb 2010 00:13:58 -0800

My preference is to deal with the specific value vs generic value issue.

For this issue, it can affect performance even if PREPARE/EXECUTE isexecute exactly once.

In the last case I saw, a certain query was executing once every second,and with a specific value it would take < 1 ms, and with a generic valueit would take > 50 ms. That's 5% system load for one CPU core to donothing. After analysis, it was clearly a "common value" vs "not commonvalue" problem. For this particular table, it stored an integer, butonly used two values across something like 100k rows. The query was fora third value that did not exist. The difference was a sequential scanvs an index lookup.

I do not know whether the application was doing PREPARE/EXECUTE eachtime, or whether it was doing PREPARE once in advance and then EXECUTEeach time after that, but I don't think it matters, either, as I thinkboth cases deserve attention, and the problem is the same in both cases.Even one generic plan run costs 50+ the cost of both planning and execution.

Re-planning a generic plan with another generic plan may generate zerobenefit, with a measurable cost. More on this after...

All the points about ms seem invalid to me. There are many reason why mscould increase, and many of them have nothing to do with planefficiency. Again, re-planning due to a high ms, or a high ratio of ms,does not indicate that re-planning will improve the success of the plan.The planning process does not measure ms or predict ms.


My idea of an optimal system is as follows:

1) Prepare gathers and caches data about the tables involved in thequery, including column statistics that are likely to be required duringthe planning process, but prepare does not running the planning process.

2) Execute runs the planning process re-using data cached by prepare,and then executes the plan.

3) Advanced: Execute may cache the selected plan for re-use only if itcan identify a set of criteria that would allow the selected plan to betested and invalidated if the parameter nature has changed such that are-planning would likely choose another plan. Execute may cache multipleplans against a prepared statement, provided that each cached planidentify invalidation criteria.

4) Even more Advanced: Prepare may identify that elements of the planthat will always be the same, no matter what parameter is specified, andcache these results for substitution into the planning phase whenexecute is run. (Effectively lifting the planning from execute toprepare, but only where it makes obvious [= cheap to detect] sense)

This treats the whole statement planning and execution as a pipeline,lengthening the pipeline, and adjusting some of the pipeline elementsfrom prepare to execute. It has the benefit of having fastprepare/execute whether execute is invoked only once or many times. Theeffect is that all statements are specifically planned, but specificplans are re-used wherever possible.

To support the case of changing data, I think the analyze process shouldbe able to force invalidation of cached plans, and force the cachedcolumn statistics for prepared statements to be invalidated andre-queried on demand, or push new statistics directly into the preparedstatements. It makes no sense (to me) to re-plan for the same parametersuntil an analyze is done, so this tells me that analyze is the eventthat should cause the re-plan to occur.

I think anything less than the above will increasing the performance ofsome queries while describing the performance of other queries. It mightbe possible to guess which queries are more valuable to people thanothers, and hard code solutions for these specific queries, but hardcoding solutions will probably always be a "lowest hanging fruit" solution.

After writing this, I'm pretty sure that implementation of the aboveinto PostgreSQL would be difficult, and it could be a valid concern thatthe investment is not worth the benefit at this time. It's a tough problem.


My $0.01 CDN. :-)

Cheers,
mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding bad prepared-statement plans.

Reply via email to