On Fri, September 1, 2006 16:53, Martijn van Oosterhout wrote:

> Interesting thought. It might be worth trying. But my big question: is
> all this testing and counting actually going to be faster than just
> replanning? Postgresql's planner is not that slow.

In the best case (which of course would have to be very frequent for any
of this to matter in the first place) it's mainly just a short loop
comparing the call's parameter values to their counterparts stored with
the plan and update those two-bit confidence counters.  You wouldn't
*believe* how simple you have to keep these things in processor
architecture.  :-)


> The thing is that number of possible plans is going to be proportional
> to factorial(number of tables). Once you have 3 tables you're going to
> have at least a dozen possible plans, probably more. What the best plan
> is depends strongly on what the parameters are.

Of course.  That's the whole point: to end up with a small but effective
subset of all those possible plans.  I'd guess that you could cover even
most of the nasty cases with a maximum of three plans or so per prepared
statement, including the original fully-generalized one.  The plans could
be replaced on an LRU basis, which isn't very costly for three or so
entries.


> Anyway, your plan assumes that you have information to work with. The
> current system plans prepared queries with no information at all about
> parameters and people are advocating to keep it that way. I think a
> good first step would be the plan on first execution, like Oracle does.

Yes, delaying things a bit can help a lot sometimes.  That's also what JIT
compilers in JVMs do, for instance.  FWIW, libpqxx doesn't prepare
statements until they're first called anyway.

But if this choice to discard parameter information is exactly what causes
a lot of the bad plans in the first place, as Peter says, what's wrong
with putting it to use instead?  For those cases, you're pretty much
screwed by definition as long as you fail to do so.  And it's not like
what I'm suggesting is very difficult!

The real question is whether it's worthwhile.  To find that out, we'd need
to estimate four factors: coverage (how often you'd get a useful
prediction), accuracy (how often that prediction would be accurate), cost
of misprediction (near-zero compared to current situation, assuming we
keep the generalized plans handy), and savings for correct prediction (in
our case, benefit of planning for a constant instead of a variable minus
the cost of re-planning which you say isn't very expensive).

Based on what Peter and you tell me about cost, the main worries here are
coverage and accuracy.  Coverage and accuracy can be extracted (and
tweaked!) relatively easily if we have logs of prepared-statement
executions in a wide variety of real-life applications.  Listings of
consecutive prepared-statement invocations (statement name plus parameter
values) are all that's needed.

Do we have any logs like that?  If we do, I'll be more than happy to run
some simulations and see if the idea shows any promise.  Like I said,
there's every chance that it doesn't.  It was just an off-the-cuff
suggestion and if it's no good I'll have no problems saying so.  But
there's not much point sitting around arguing over theoretical merits if
they're that easy to quantify!


Jeroen



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to