My preference is to deal with the specific value vs generic value issue.

For this issue, it can affect performance even if PREPARE/EXECUTE is execute exactly once.

In the last case I saw, a certain query was executing once every second, and with a specific value it would take < 1 ms, and with a generic value it would take > 50 ms. That's 5% system load for one CPU core to do nothing. After analysis, it was clearly a "common value" vs "not common value" problem. For this particular table, it stored an integer, but only used two values across something like 100k rows. The query was for a third value that did not exist. The difference was a sequential scan vs an index lookup.

I do not know whether the application was doing PREPARE/EXECUTE each time, or whether it was doing PREPARE once in advance and then EXECUTE each time after that, but I don't think it matters, either, as I think both cases deserve attention, and the problem is the same in both cases. Even one generic plan run costs 50+ the cost of both planning and execution.

Re-planning a generic plan with another generic plan may generate zero benefit, with a measurable cost. More on this after...

All the points about ms seem invalid to me. There are many reason why ms could increase, and many of them have nothing to do with plan efficiency. Again, re-planning due to a high ms, or a high ratio of ms, does not indicate that re-planning will improve the success of the plan. The planning process does not measure ms or predict ms.

My idea of an optimal system is as follows:

1) Prepare gathers and caches data about the tables involved in the query, including column statistics that are likely to be required during the planning process, but prepare does not running the planning process.

2) Execute runs the planning process re-using data cached by prepare, and then executes the plan.

3) Advanced: Execute may cache the selected plan for re-use only if it can identify a set of criteria that would allow the selected plan to be tested and invalidated if the parameter nature has changed such that a re-planning would likely choose another plan. Execute may cache multiple plans against a prepared statement, provided that each cached plan identify invalidation criteria.

4) Even more Advanced: Prepare may identify that elements of the plan that will always be the same, no matter what parameter is specified, and cache these results for substitution into the planning phase when execute is run. (Effectively lifting the planning from execute to prepare, but only where it makes obvious [= cheap to detect] sense)

This treats the whole statement planning and execution as a pipeline, lengthening the pipeline, and adjusting some of the pipeline elements from prepare to execute. It has the benefit of having fast prepare/execute whether execute is invoked only once or many times. The effect is that all statements are specifically planned, but specific plans are re-used wherever possible.

To support the case of changing data, I think the analyze process should be able to force invalidation of cached plans, and force the cached column statistics for prepared statements to be invalidated and re-queried on demand, or push new statistics directly into the prepared statements. It makes no sense (to me) to re-plan for the same parameters until an analyze is done, so this tells me that analyze is the event that should cause the re-plan to occur.

I think anything less than the above will increasing the performance of some queries while describing the performance of other queries. It might be possible to guess which queries are more valuable to people than others, and hard code solutions for these specific queries, but hard coding solutions will probably always be a "lowest hanging fruit" solution.

After writing this, I'm pretty sure that implementation of the above into PostgreSQL would be difficult, and it could be a valid concern that the investment is not worth the benefit at this time. It's a tough problem.

My $0.01 CDN. :-)

Cheers,
mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to