On 02/26/2010 05:20 AM, Jeroen Vermeulen wrote:
Mark Mielke wrote:

All the points about ms seem invalid to me. There are many reason why ms could increase, and many of them have nothing to do with plan efficiency. Again, re-planning due to a high ms, or a high ratio of ms, does not indicate that re-planning will improve the success of the plan. The planning process does not measure ms or predict ms.

That's true, but missing some very basic points about the idea: one, if we can tell that a query is going to be expensive, then the cost of re-planning it is marginal. Two, if we can tell that a query is going to be expensive, then we stand a lot to gain if re-planning turns out to be useful. It follows that we can afford to re-plan on the off-chance, without anything more than a vague orders-of-magnitude idea of what "expensive" means.

What Tom said validates a big assumption I've been making: that we do in fact have a decent shot at telling in advance that a query is going to be expensive. Which means we have a decent shot at stopping your 100ms query from taking seconds just because you prepared it and are missing out on that tiny partial index. That would be worth the extra planning time at a 1% hit rate, and there's not much downside if we don't reach that.

You trimmed most of my concerns. :-) Problems:

1) If I do a PREPARE/EXECUTE, the above lengthens the process from 1 generic planning plus 1 generic plan execute to 1 generic planning, 1 specific planning, and 1 specific plan execution. This is still overall longer than a regular statement and it still may be longer than the original generic plan on its own. The hope is that the analysis is somehow detecting the scenario where a generic plan makes no sense, but the criteria is not about whether the generic plan actually does make sense - the criteria is "can the customer afford to wait longer for us to second guess ourselves?" It's a guess. As a guess, it means sometimes it will be right, and sometimes it will be wrong.

2) Only the "order of magnitude" (by estimate) plans will benefit. If you set the number to 100X, then most plans won't benefit. If you set it to less than 100X, you increase the chance of guessing wrong in other cases. In any case, there is still no guarantee that a specific plan will be faster, so even in the 100X case, the overall results could be slower - it's just that you've decided the customer can afford to wait longer.

My idea of an optimal system is as follows:

1) Prepare gathers and caches data about the tables involved in the query, including column statistics that are likely to be required during the planning process, but prepare does not running the planning process.

It sounds to me like you're in the process of inventing another planning process. Developer time aside, how much CPU time can you afford to throw at this?

I already said I don't think PostgreSQL could easily evolve here. However, I wanted to point out that the problem may be architectural.

As for developer time and CPU time, that's not really relevant. If PREPARE/EXECUTE could be reliably sped up, than the savings is probably measure in millions of dollars or more, as it is widely used by many applications throughout the day on hundreds of thousands of computers.

Oh, you mean is it worth scratching my itch? :-) Not really. I was thinking about it yesterday and decided that such a major change might just as easily result in a new database engine, and I didn't want to go there.

Still, if some clever person agrees with me that it is an architecture problem, and that PostgreSQL could benefit from a clean "from scratch" caching mechanism for statements (note that what I described could probably be extended to support automatic prepare of every statement, and matching of query to prepared statement based on text, similar to MySQL query caching), and can come up with a way to do this using the existing architecture - that would be great. Or, they can tell me "too hard" as you are. That's fine too... :-)

I don't see any reason to argue over what would be optimal when so much information is still missing. It just makes the problem look harder than it is. To me, our best shot at getting something useful is to stay simple and defensive. After that, if there is still a need, we'll have code to help us gather more data and figure out how to make it better. Nothing wrong with the lowest-hanging fruit.

What information is missing?

PREPARE sucks in many known situations. It is a documented fact. :-)

Will "guessing" at when the user can afford to wait longer improve the situation? Maybe or often, but not always.

Cheers,
mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to