On Sat, Mar 17, 2012 at 11:33 AM, Greg Stark <st...@mit.edu> wrote: > On Sat, Mar 17, 2012 at 9:34 AM, Simon Riggs <si...@2ndquadrant.com> wrote: >> My wish was to register this as both a common and significant bug, > > It has definitely come up here before many times. > > However at root the problem is part of the general class of not > understanding how two different columns are related. Postgres is > assuming they're entirely independent and therefore all the values of > x are uniformly distributed over values of y. > > To plan this better in your case it would have to know that blah_id <= > 72572020 is not equally likely for user_id = ANY > ('{....list....}'::integer[]) as it is for the table as a whole.
That is not the root cause in this case, though I agree that is also a problem. The first plan is more complex because of an ORDER BY that favours the index scan, but that's actually irrelevant to the case; the use of blah_id is actually the user using the fact that things are allocated in date order to avoid doing a date sort. The problems are as I described them (1) no account made for sparsity, and other factors leading to an overestimate of rows (N) (2) inappropriate assumption of the effect of LIMIT m, which causes a costly SeqScan to appear better than an IndexScan for low m/N, when in fact that is seldom the case. Overestimating N in (1) inverts the problem, so that an overestimate isn't the safe thing at all. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers