Josh Berkus wrote:
Tom,


What I'd like to do is implement the constant method for 8.2, and work
on doing the S() method later on.  Does that make sense?

I'm not thrilled with putting in a stopgap that we will have to support
forever.  The constant method is *clearly* inadequate for many (probably
most IMHO) practical cases.  Where do you see it being of use?


Well, mostly for the real-world use cases where I've run into SRF estimate issues, which have mostly been SRFs which return one row.


W.R.T. the estimator function method, the concern about recursion seems
misplaced.  Such an estimator presumably wouldn't invoke the associated
function itself.


No, but if you're calling the S() estimator in the context of performing a join, what do you supply for parameters?

I've been thinking about this more, and now I don't see why this is an issue. When the planner estimates how many rows will be returned from a subquery that is being used within a join, it can't know which "parameters" to use either. (Parameters being whatever conditions the subquery will pivot upon which are the result of some other part of the execution of the full query.) So it seems to me that function S() is at no more of a disadvantage than the planner.

If I defined a function S(a integer, b integer) which provides an estimate for the function F(a integer, b integer), then S(null, null) could be called when the planner can't know what a and b are. S could then still make use of the table statistics to provide some sort of estimate. Of course, this would mean that functions S() cannot be defined strict.

I'm more concerned about coming up with a usable API for such things. Our existing mechanisms for estimating operator
selectivities require access to internal planner data structures, which
makes it pretty much impossible to write them in anything but C.  We'd
need something cleaner to have a feature I'd want to export for general
use.


Yes -- we need to support the simplest case, which is functions that return either (a) a fixed number of rows, or (b) a fixed multiple of the number of rows passed to the function. These simple cases should be easy to build. For more complex estimation, I personally don't see a problem with forcing people to hack it in C.

Could we provide table statistics access functions in whatever higher-level language S() is written in, or is there something fundamentally squirrelly about the statistics that would make this impossible?

Also, since we haven't nailed down a language for S(), if we allowed any of sql, plpgsql, plperl, plpython, etc, then we would need access methods for each, which would place a burden on all PLs, right? That argument isn't strong enough to make me lean either way; it's just an observation.



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to