On Fri, Mar 16, 2012 at 9:39 PM, Jeff Davis <pg...@j-davis.com> wrote: > On Fri, 2012-03-16 at 18:25 +0000, Simon Riggs wrote: >> Any time we apply a LIMIT clause to a plan with a SeqScan or >> unqualified IndexScan, we shouldn't assume the scan will do less than >> say 10% of the table. It might, but its an unsafe assumption because >> as the selectivity decreases so does the safety of the assumption that >> rows are uniformly distributed. > > Just trying to follow along. You mean "as the selectivity _increases_ > the safety of the assumption that the rows are uniformly distributed > decreases", right?
Selectivity meaning the value between 0 and 1 that describes, in the planner, the fraction of rows we will get back from a scan. 1.0 means 100% of rows. When selectivity is low, that means very few rows will come back. I think you are using "high selectivity" as meaning "returns few of the rows", so you understand me, but just flip the meaning of the words. When you have lots of rows, its a good assumption they are spread out and a scan will find some of them quickly. When you have very few rows, assuming they are evenly spaced is just weird. Clumpiness of some kind seems much more likely. Much more easily understood if the values are dates, for example. Given the estimated number of rows is deliberately a worst case (i,e. high), that sounds like the scan will work. Yet the reality is that doing the scan is incredibly costly when those assumptions break, which they do, often. Especially when the values don't exist at all because the table is sparse. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers