On Sat, Sep 12, 2009 at 6:14 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Sep 6, 2009, at 10:45 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> ... But now that we have a plan for a less obviously broken costing
>>> approach, maybe we should open the floodgates and allow
>>> materialization
>>> to be considered for any inner path that doesn't materialize itself
>>> already
>
>> Maybe.  I think some experimentation will be required.  We also have
>> to be aware of effects on planning time; match_unsorted_outer() is,
>> AIR, a significant part of the CPU cost of planning large join problems.
>
> I've committed some changes pursuant to this discussion.  It may be that
> match_unsorted_outer gets a bit slower, but I'm not too worried about
> that.  My experience is that the code that tries different mergejoin
> options eats way more cycles than the nestloop code does.

One problem with the current implementation of cost_rescan() is that
it ignores caching effects.  It seems to be faster to rescan a
materialize node than it is to rescan a seqscan of a table, even if
there are no restriction clauses, presumably because you get to skip
tuple visibility checks and maybe some other overhead, too.  But
cost_rescan() thinks that rescanning the table will require rereading
the whole thing from disk, which isn't right either - it probably
ought to factor in effective_cache_size much as the estimates for
iterated index scans do.  I'm not sure how many real problems this is
going to create.

Another potential problem is that materializing a whole-table seqscan
to avoid repeating the tuple visibility checks may be a win in some
strict sense, but there are externalities: it's also going to use a
lot more memory/disk than just rescanning the table.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to