On Sat, Apr 24, 2010 at 1:31 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> At least AIUI, the use case for this feature is that you want to avoid
>> creating "the same" temporary table over and over again.
>
> The context that I've seen it come up in is that people don't want to
> clutter their functions with create-it-if-it-doesn't-exist logic,
> which you have to have given the current behavior of temp tables.
> Any performance gain from reduced catalog churn would be gravy.

I think there's a significant contingent on this mailing list who feel
that that gravy would be rather tasty and would like very much to
enjoy some of it along with their temporary table tetrazzini.

> Aside from the DROP problem, I think this implementation proposal
> has one other big shortcoming: what are you going to do about
> table statistics?  In many cases, you really *have* to do an ANALYZE
> once you've populated a temp table, if you want to get decent plans
> for it.  Where will you put those stats?

For a first cut, I had thought about ignoring the problem.  Now, that
may sound stupid, because now if two different backends have very
different distributions of data in the table and both do an ANALYZE,
one set of statistics will clobber the other set of statistics.  On
the flip side, for some usage patterns, it might be actually work out
to a win.  Maybe the data I'm putting in here today is a great deal
like the data I put in here yesterday, and planning it with
yesterday's statistics doesn't cost enough to be worth a re-ANALYZE.

If we don't want to do that, I suppose one option is to create a
pg_statistic-like table in the backend's temporary tablespace and put
them there; or we could put them into a backend-local hash table.  The
current setup of pg_statistic is actually somewhat weak for a number
of things we might want to do: for example, it might be interesting to
gather statistics for the subset of a table for which a particular
partial index is predOK.  When such an index is available for a
particular query, we could use the statistics for that subset of the
table instead of the overall statistics for the table, and get better
estimates.  Or we could even let the user specify predicates which
will cause the table to have a different statistical distribution than
the table as a whole, and gather statistics for the subset that
matches the predicate.  One approach would be to make the starelid
column able to reference something other than a relation OID, although
I don't think that actually helps with the global temp table problem
because if we use the real pg_statistic to store the data then we have
to arrange to clean it up.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to