[HACKERS] Some notes about Param handling with "Oracle style" plpgsql variables

Tom Lane Mon, 02 Nov 2009 08:31:24 -0800

One of the interesting properties of Oracle-compatible variable
references in plpgsql is that the set of variables referenced by a
given query could change during a forced replan.  For example,
consider


        declare x int;
                r record;
        ...
        for r in select x,y from tab loop ...

If tab contains a column "x" then the "x" reference in the SELECT
refers to tab.x; if not, it refers to the plpgsql variable x.
So when first executing the SELECT we might find that it requires
a Param reference to the plpgsql variable, and then after a replan
is forced by ALTER TABLE tab ADD COLUMN x, there is no need for
the Param anymore.  Or vice versa.

This kinda calls into question whether the Oracle way is actually
a good idea or not; but my purpose here is not to debate that,
just to look at what it takes to implement it.

Currently, plpgsql generates a list of the variables referenced by
any SQL statement or expression immediately upon seeing the text,
before it's ever even fed to the core parser.  I had been envisioning
having the parser callback hook construct the list on-the-fly during
parsing, but the possibility that the list will change from time to
time means that other changes are needed too.  Notably:

1. plancache.c does not have any provision for letting the Param type
array associated with a stored statement change when the statement is
replanned due to SI invalidation.

2. The control flow for a replan is that plpgsql calls SPI_execute_plan,
which calls RevalidateCachedPlan, which does the replan if the cached
plan is discovered to be stale.  However, plpgsql already had to set up
the list of actual parameter values before it called SPI_execute_plan,
which means it is *way* too late to change the list of required Params
even if plancache let us do it.

After chewing on these facts for awhile, I am thinking that the best
solution is for plpgsql to abandon the notion of a predetermined list
of parameters for a SQL query altogether.  What that list basically
provides is a mapping from Param numbers ($n) to plpgsql "datum numbers"
(indexes in the list of a plpgsql function's variables).  We could make
that mapping always be one-to-one, since there's no real reason that the
Params available to a query have to be consecutively numbered.  So the
transformColumnRef hook would just pass back a Param using the
referenced variable's datum number as paramid; it wouldn't bother at all
with building a data structure listing the specific variables actually
used in the query.

As far as plancache goes, it would therefore always see a null array
of Param type OIDs associated with a plpgsql-generated query, and we'd
not have to provide a way to update that.  (We'd still keep the ability
to store such an array, because most other callers of plancache will
still want a fixed list of Params.)  What we'd have to add to plancache
instead is the ability to install caller-determined parser callback
hooks when it is calling the parser for a replan.  This seems fairly
easy to do --- I'm envisioning a sort of meta-hook function that gets
called with the new ParseState and can insert hook function pointers
in it.

The other issue with this is what to do at runtime.  We could do it
with no other changes if we had plpgsql always set up Values/Nulls
arrays listing *every* datum's current value.  This seems a bit
brute-force though --- it could be slow in a function with a lot of
variables, and in most cases any specific query or expression would
not need most of those values.  What I think we should do instead
is extend the ParamListInfo structure to add a callback hook function
that populates individual ParamExternData array entries on-demand.
The core executor would call the hook when it tried to fetch the
value of a Param that was currently invalid (ptype == 0).  So the
hook would be invoked only once per query per referenced parameter,
which shouldn't be much overhead.  Another interesting property
of this approach is that it'd fix the longstanding user complaint
that constructions like
        if (TG_OP = 'INSERT' and NEW.foo = 'bar') ...
fail prematurely.  The executor would never demand the value
of NEW.foo, and thus not fail, if TG_OP isn't INSERT.

Comments?

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Some notes about Param handling with "Oracle style" plpgsql variables

Reply via email to