On Thu, Aug 30, 2018 at 5:58 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Alexander Korotkov <a.korot...@postgrespro.ru> writes: > > On Thu, Aug 30, 2018 at 5:05 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Because it's what the mental model of startup cost says it should be. > > > From this model we make a conclusion that we're starting getting rows > > from sequential scan sooner than from index scan. And this conclusion > > doesn't reflect reality. > > No, startup cost is not the "time to find the first row". It's overhead > paid before you even get to start examining rows. > > I'm disinclined to consider fundamental changes to our costing model > on the basis of this example. The fact that the rowcount estimates are > so far off reality means that you're basically looking at "garbage in, > garbage out" for the cost calculations --- and applying a small LIMIT > just magnifies that. > > It'd be more useful to think first about how to make the selectivity > estimates better; after that, we might or might not still think there's > a costing issue.
I understand that startup cost is not "time to find the first row". But I think this example highlight not one but two issues. 1) Row count estimates for joins are wrong. 2) Rows are assumed to be continuous while in reality they are discrete. So, if we reverse the assumptions made in LIMIT clause estimation, we may say that it's basically assuming that we need to fetch only fraction of row from the sequential scan node. And in the case we really fetch 101 rows in each join with t2, this logic would still bring us to the bad plan. And now I'm not proposing go rush redesigning planner to fix that. I just think it's probably something worth discussion. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company