Tom Lane wrote:
Oliver Jowett <[EMAIL PROTECTED]> writes:

Following a recent discussion on the JDBC list (http://archives.postgresql.org/pgsql-jdbc/2004-05/msg00100.php) I'm looking at modifying when unnamed statements received via a v3 protocol Parse message are planned. The idea is to delay planning until the Bind message arrives, so that the planner benefits from knowing the actual parameter values in use. This means that clients such as JDBC can use an unnamed Parse/Bind as a way to provide typed and/or binary parameter values without running the risk of worsening the query plan.


I'm a bit concerned about driving such a change in behavior off nothing
more than whether the statement is named or not.  The protocol spec does
say this:

: In addition, an "unnamed" prepared statement and portal exist. Although
: these behave largely the same as named objects, operations on them are
: optimized for the case of executing a query only once and then
: discarding it, whereas operations on named objects are optimized on the
: expectation of multiple uses.

so you could argue that this change is just an optimization of that
kind.  It had better be documented though.

Ideally the when-to-plan decision should be client-controllable per statement. I couldn't see a way to do this without causing a protocol version change; using the unnamed statement sounded like a reasonable compromise.


It'd also make the extended query protocol closer to the simple query protocol as the documentation claims. This is the underlying problem that bit me: the extended query protocol is *not* just the simple query protocol broken down into smaller steps. There are non-obvious changes deep in the guts of the system if you use the extra functionality of the extended protocol.

Would it be better to bite the bullet and bump the protocol version, adding a new field to Parse to control this behaviour?

3. In execute_bind_message, if the source statement is not planned (plan_list == NULL), plan it, passing the concrete parameter values received in the Bind message.


I am not sure you can rely on plan_list == NULL to be the driving test;
consider an empty query string for instance.  It'd be better to add a
boolean planning_done field to the struct.

Oops -- I made the incorrect assumption that NIL != NULL (i.e. it was a shared placeholder node for empty lists).


Looks like this would only bite in the empty-query case (where querytree_list is NIL). Trying to plan that via pg_plan_queries looks fairly harmless. Nevertheless, I can add a field to the struct.

The planning happens using the target portal's memory context; the resulting plan_list is only used for this portal and the source statement remains unplanned.


I don't like that at all.  Do the planning using the given parameters,
but save the plan.  Otherwise you have substantially pessimized the
behavior for the case where an unnamed statement is reused.

How often is the unnamed statement reused? From my exhaustive sample of one (the JDBC driver) it's never reused. (admittedly that means it doesn't affect me either way, but..)


If the planning behaviour was controllable per-statement, would you be ok with replanning on every Bind when the statement asks for delaying planning?

The reason I don't like storing the plan is that I see nothing special about the first set (or Nth set) of parameters if the statement is being reused, and you could come up with a plan that's substantially worse than if you'd just planned before parameters were available (e.g. picking an indexscan over a seqscan because the first Bind had selective parameters, then subsequently Binding nonselective parameters).

4. In planner(), if parameters are provided, replace Param nodes with Const nodes in the query tree where appropriate.


No, you can't replace Params with Consts.  You can use the values of
Params in the places where selfuncs.c wants to extract constant values
for estimation purposes.

Transforming the tree seemed the most reliable way to get a result that's consistent with Const being in the tree from the start (i.e. the simple query case), and doing the replacement hasn't broken anything in my (admittedly brief) testing yet. What should I be looking at to see the breakage?


Note that the Const vs. Param difference appears to affect eval_const_expressions too, so it's more than just the selectivity functions that would need changing if we want to produce the same query plans as from an equivalent simple query.

... it requires threading the parameter values through many planner functions. There are quite a few (20-30ish?) functions affected.


This would be a pretty thorough mess, especially after you got done
propagating the info to selectivity functions.  Frankly I'd suggest
using a planner global variable instead; see PlannerParamList for a
model.

If I'm replacing the Param nodes it doesn't need to go as deep as the selectivity functions. It's not great but it's not a disaster either. Certainly if I needed to propagate all the way to the selectivity functions it'd get out of control.


Would it be safe to save-and-clear the global parameter state only at the topmost planner() level as is done for PlannerParamList and friends? I'm concerned about the global parameter state interfering with the parameter values of things like functions evaluated during planning. Tracing the opportunities for recursion in the planner is, well, interesting..

Thanks for your comments!

-O

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to