Re: [HACKERS] Avoiding bad prepared-statement plans.

Mark Mielke Fri, 26 Feb 2010 06:50:02 -0800

On 02/26/2010 05:20 AM, Jeroen Vermeulen wrote:

Mark Mielke wrote:
All the points about ms seem invalid to me. There are many reason whyms could increase, and many of them have nothing to do with planefficiency. Again, re-planning due to a high ms, or a high ratio ofms, does not indicate that re-planning will improve the success ofthe plan. The planning process does not measure ms or predict ms.
That's true, but missing some very basic points about the idea: one,if we can tell that a query is going to be expensive, then the cost ofre-planning it is marginal. Two, if we can tell that a query is goingto be expensive, then we stand a lot to gain if re-planning turns outto be useful. It follows that we can afford to re-plan on theoff-chance, without anything more than a vague orders-of-magnitudeidea of what "expensive" means.
What Tom said validates a big assumption I've been making: that we doin fact have a decent shot at telling in advance that a query is goingto be expensive. Which means we have a decent shot at stopping your100ms query from taking seconds just because you prepared it and aremissing out on that tiny partial index. That would be worth the extraplanning time at a 1% hit rate, and there's not much downside if wedon't reach that.


You trimmed most of my concerns. :-) Problems:

1) If I do a PREPARE/EXECUTE, the above lengthens the process from1 generic planning plus 1 generic plan execute to 1 generic planning, 1specific planning, and 1 specific plan execution. This is still overalllonger than a regular statement and it still may be longer than theoriginal generic plan on its own. The hope is that the analysis issomehow detecting the scenario where a generic plan makes no sense, butthe criteria is not about whether the generic plan actually does makesense - the criteria is "can the customer afford to wait longer for usto second guess ourselves?" It's a guess. As a guess, it means sometimesit will be right, and sometimes it will be wrong.

2) Only the "order of magnitude" (by estimate) plans will benefit.If you set the number to 100X, then most plans won't benefit. If you setit to less than 100X, you increase the chance of guessing wrong in othercases. In any case, there is still no guarantee that a specific planwill be faster, so even in the 100X case, the overall results could beslower - it's just that you've decided the customer can afford to waitlonger.

My idea of an optimal system is as follows:
1) Prepare gathers and caches data about the tables involved in thequery, including column statistics that are likely to be requiredduring the planning process, but prepare does not running theplanning process.
It sounds to me like you're in the process of inventing anotherplanning process. Developer time aside, how much CPU time can youafford to throw at this?

I already said I don't think PostgreSQL could easily evolve here.However, I wanted to point out that the problem may be architectural.

As for developer time and CPU time, that's not really relevant. IfPREPARE/EXECUTE could be reliably sped up, than the savings is probablymeasure in millions of dollars or more, as it is widely used by manyapplications throughout the day on hundreds of thousands of computers.

Oh, you mean is it worth scratching my itch? :-) Not really. I wasthinking about it yesterday and decided that such a major change mightjust as easily result in a new database engine, and I didn't want to gothere.

Still, if some clever person agrees with me that it is an architectureproblem, and that PostgreSQL could benefit from a clean "from scratch"caching mechanism for statements (note that what I described couldprobably be extended to support automatic prepare of every statement,and matching of query to prepared statement based on text, similar toMySQL query caching), and can come up with a way to do this using theexisting architecture - that would be great. Or, they can tell me "toohard" as you are. That's fine too... :-)

I don't see any reason to argue over what would be optimal when somuch information is still missing. It just makes the problem lookharder than it is. To me, our best shot at getting something usefulis to stay simple and defensive. After that, if there is still aneed, we'll have code to help us gather more data and figure out howto make it better. Nothing wrong with the lowest-hanging fruit.


What information is missing?

PREPARE sucks in many known situations. It is a documented fact. :-)

Will "guessing" at when the user can afford to wait longer improve thesituation? Maybe or often, but not always.


Cheers,
mark


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding bad prepared-statement plans.

Reply via email to