Re: Startup cost of sequential scan

Alexander Korotkov Thu, 30 Aug 2018 08:38:59 -0700

On Thu, Aug 30, 2018 at 5:58 PM Tom Lane <[email protected]> wrote:
>
> Alexander Korotkov <[email protected]> writes:
> > On Thu, Aug 30, 2018 at 5:05 PM Tom Lane <[email protected]> wrote:
> >> Because it's what the mental model of startup cost says it should be.
>
> > From this model we make a conclusion that we're starting getting rows
> > from sequential scan sooner than from index scan.  And this conclusion
> > doesn't reflect reality.
>
> No, startup cost is not the "time to find the first row".  It's overhead
> paid before you even get to start examining rows.
>
> I'm disinclined to consider fundamental changes to our costing model
> on the basis of this example.  The fact that the rowcount estimates are
> so far off reality means that you're basically looking at "garbage in,
> garbage out" for the cost calculations --- and applying a small LIMIT
> just magnifies that.
>
> It'd be more useful to think first about how to make the selectivity
> estimates better; after that, we might or might not still think there's
> a costing issue.


I understand that startup cost is not "time to find the first row".
But I think this example highlight not one but two issues.
1) Row count estimates for joins are wrong.
2) Rows are assumed to be continuous while in reality they are
discrete.  So, if we reverse the assumptions made in LIMIT clause
estimation, we may say that it's basically assuming that we need to
fetch only fraction of row from the sequential scan node.  And in the
case we really fetch 101 rows in each join with t2, this logic would
still bring us to the bad plan.  And now I'm not proposing go rush
redesigning planner to fix that.  I just think it's probably something
worth discussion.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Startup cost of sequential scan

Reply via email to