> >> When we 'EXPLAIN' this query, PostgreSQL says it is using the index > >> idx_receiveddatetime. The way the application is designed means that > >> in virtually all cases the query will have to scan a very long way > >> into idx_receiveddatetime to find the first record where userid = > 311369000. > >> If however we delete the idx_receiveddatetime index, the query uses > >> the idx_userid_receiveddatetime index, and the query only takes a few > >> milliseconds. > > > That's just bizarre ... it knows the index is applicable, and the cost > > estimates clearly favor the better index, so why did it pick the worse > > one? > > No, scratch that, I misread the plans. It *is* picking the plan it thinks has > lower cost; it's just a mistaken cost estimate. It's strange though that the less > selective indexscan is getting a lower cost estimate. I wonder whether your > table is (almost) perfectly ordered by receiveddatetime, such that the one- > column index has correlation close to 1.0. That could possibly lower the cost > estimate to the point where it'd appear to dominate the other index. It'd be > useful to see the pg_stats.correlation value for both the userid and > receiveddatetime columns.
Yes, the table is indeed nearly perfectly ordered by receiveddatetime (correlation 0.998479). correlation on userid is -0.065556. n_distinct on userid is also low: 1097. Is the problem perhaps something like the following: PostgreSQL is thinking that because there are not many userids and there is low correlation, that if it just scans the table from the top in date order, this will be cheap (because receiveddatetime correlation is high so it won't have to seek randomly), and it won't have to scan very far before it finds the first row with a matching userid. The problem is though that in our application the userids are not scattered randomly. There are small regions of the table where a particular userid appears frequently, interspersed with much larger regions (perhaps millions or tens of millions of rows) where it doesn't appear at all. So in fact the planner's preferred solution is often pathologically bad. Is there a solution? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance