On Fri, September 1, 2006 16:53, Martijn van Oosterhout wrote: > Interesting thought. It might be worth trying. But my big question: is > all this testing and counting actually going to be faster than just > replanning? Postgresql's planner is not that slow.
In the best case (which of course would have to be very frequent for any of this to matter in the first place) it's mainly just a short loop comparing the call's parameter values to their counterparts stored with the plan and update those two-bit confidence counters. You wouldn't *believe* how simple you have to keep these things in processor architecture. :-) > The thing is that number of possible plans is going to be proportional > to factorial(number of tables). Once you have 3 tables you're going to > have at least a dozen possible plans, probably more. What the best plan > is depends strongly on what the parameters are. Of course. That's the whole point: to end up with a small but effective subset of all those possible plans. I'd guess that you could cover even most of the nasty cases with a maximum of three plans or so per prepared statement, including the original fully-generalized one. The plans could be replaced on an LRU basis, which isn't very costly for three or so entries. > Anyway, your plan assumes that you have information to work with. The > current system plans prepared queries with no information at all about > parameters and people are advocating to keep it that way. I think a > good first step would be the plan on first execution, like Oracle does. Yes, delaying things a bit can help a lot sometimes. That's also what JIT compilers in JVMs do, for instance. FWIW, libpqxx doesn't prepare statements until they're first called anyway. But if this choice to discard parameter information is exactly what causes a lot of the bad plans in the first place, as Peter says, what's wrong with putting it to use instead? For those cases, you're pretty much screwed by definition as long as you fail to do so. And it's not like what I'm suggesting is very difficult! The real question is whether it's worthwhile. To find that out, we'd need to estimate four factors: coverage (how often you'd get a useful prediction), accuracy (how often that prediction would be accurate), cost of misprediction (near-zero compared to current situation, assuming we keep the generalized plans handy), and savings for correct prediction (in our case, benefit of planning for a constant instead of a variable minus the cost of re-planning which you say isn't very expensive). Based on what Peter and you tell me about cost, the main worries here are coverage and accuracy. Coverage and accuracy can be extracted (and tweaked!) relatively easily if we have logs of prepared-statement executions in a wide variety of real-life applications. Listings of consecutive prepared-statement invocations (statement name plus parameter values) are all that's needed. Do we have any logs like that? If we do, I'll be more than happy to run some simulations and see if the idea shows any promise. Like I said, there's every chance that it doesn't. It was just an off-the-cuff suggestion and if it's no good I'll have no problems saying so. But there's not much point sitting around arguing over theoretical merits if they're that easy to quantify! Jeroen ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org