On 02/26/2010 05:20 AM, Jeroen Vermeulen wrote:
Mark Mielke wrote:
All the points about ms seem invalid to me. There are many reason why
ms could increase, and many of them have nothing to do with plan
efficiency. Again, re-planning due to a high ms, or a high ratio of
ms, does not indicate that re-planning will improve the success of
the plan. The planning process does not measure ms or predict ms.
That's true, but missing some very basic points about the idea: one,
if we can tell that a query is going to be expensive, then the cost of
re-planning it is marginal. Two, if we can tell that a query is going
to be expensive, then we stand a lot to gain if re-planning turns out
to be useful. It follows that we can afford to re-plan on the
off-chance, without anything more than a vague orders-of-magnitude
idea of what "expensive" means.
What Tom said validates a big assumption I've been making: that we do
in fact have a decent shot at telling in advance that a query is going
to be expensive. Which means we have a decent shot at stopping your
100ms query from taking seconds just because you prepared it and are
missing out on that tiny partial index. That would be worth the extra
planning time at a 1% hit rate, and there's not much downside if we
don't reach that.
You trimmed most of my concerns. :-) Problems:
1) If I do a PREPARE/EXECUTE, the above lengthens the process from
1 generic planning plus 1 generic plan execute to 1 generic planning, 1
specific planning, and 1 specific plan execution. This is still overall
longer than a regular statement and it still may be longer than the
original generic plan on its own. The hope is that the analysis is
somehow detecting the scenario where a generic plan makes no sense, but
the criteria is not about whether the generic plan actually does make
sense - the criteria is "can the customer afford to wait longer for us
to second guess ourselves?" It's a guess. As a guess, it means sometimes
it will be right, and sometimes it will be wrong.
2) Only the "order of magnitude" (by estimate) plans will benefit.
If you set the number to 100X, then most plans won't benefit. If you set
it to less than 100X, you increase the chance of guessing wrong in other
cases. In any case, there is still no guarantee that a specific plan
will be faster, so even in the 100X case, the overall results could be
slower - it's just that you've decided the customer can afford to wait
longer.
My idea of an optimal system is as follows:
1) Prepare gathers and caches data about the tables involved in the
query, including column statistics that are likely to be required
during the planning process, but prepare does not running the
planning process.
It sounds to me like you're in the process of inventing another
planning process. Developer time aside, how much CPU time can you
afford to throw at this?
I already said I don't think PostgreSQL could easily evolve here.
However, I wanted to point out that the problem may be architectural.
As for developer time and CPU time, that's not really relevant. If
PREPARE/EXECUTE could be reliably sped up, than the savings is probably
measure in millions of dollars or more, as it is widely used by many
applications throughout the day on hundreds of thousands of computers.
Oh, you mean is it worth scratching my itch? :-) Not really. I was
thinking about it yesterday and decided that such a major change might
just as easily result in a new database engine, and I didn't want to go
there.
Still, if some clever person agrees with me that it is an architecture
problem, and that PostgreSQL could benefit from a clean "from scratch"
caching mechanism for statements (note that what I described could
probably be extended to support automatic prepare of every statement,
and matching of query to prepared statement based on text, similar to
MySQL query caching), and can come up with a way to do this using the
existing architecture - that would be great. Or, they can tell me "too
hard" as you are. That's fine too... :-)
I don't see any reason to argue over what would be optimal when so
much information is still missing. It just makes the problem look
harder than it is. To me, our best shot at getting something useful
is to stay simple and defensive. After that, if there is still a
need, we'll have code to help us gather more data and figure out how
to make it better. Nothing wrong with the lowest-hanging fruit.
What information is missing?
PREPARE sucks in many known situations. It is a documented fact. :-)
Will "guessing" at when the user can afford to wait longer improve the
situation? Maybe or often, but not always.
Cheers,
mark
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers