Thanks, Tom. I had discarded the possibility of data type mismatch already,
which was your first guess, but was wondering if the lopsided distribution
of location values would lead the planner to make a decision that is good on
average but bad for this particular query, as you point out in your second
guess.

I'll try populating the test users with a more evenly distributed location
field, which will be more realistic anyway, and see if that works out
better.

BTW, the -1 is not really a dummy value, but it's just a value that we have
been using in tests for "fake test location ID". I just started performance
measurement for my application and so far had measured performance with
every user being in the same default location and things seemed to be going
well, so I tried to switch a couple users to a different location and see
what happened, and that made performance drop significantly.
(even more detail: my queries also limit results to 10 approx, so DB quickly
found 10 rows that match location -1, but it took a while to discover there
weren't more than 2 rows with the other value).

Thanks!
Eddy

On Sun, Nov 15, 2009 at 3:33 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Eddy Escardo-Raffo <eesca...@kikini.com> writes:
> > The table used in this query is called "users", and it has columns
> "userid"
> > (primary key) and "location".
> > The "location" column is indexed.
> > The users table has 1 million rows, and all rows have integer typed value
> > '-1' for  "location" column, except for 2 rows that have the integer
> value
> > '76543'.
>
> Oh, after poking at it a bit more, I realize the problem: the planner
> doesn't want to use an indexscan because it assumes there's a
> significant probability that the search will be for -1 (in which case
> the indexscan would be slower than a seqscan, as indeed your results
> prove).  Even though it could know in this particular case that the
> comparison value isn't -1, I doubt that teaching it that would help your
> real queries where it will probably be impossible to determine the
> comparison values in advance.
>
> I would suggest considering using NULL rather than inventing a dummy
> value for unknown locations.  The estimation heuristics will play a
> lot nicer with that choice.
>
>                        regards, tom lane
>

Reply via email to