<p...@elbrief.de> wrote:
 
> insert into bla ( a , b )
>   select a , a
>     from generate_series( 1 , 1000000 ) as a ( a ) ;
 
> explain analyze select * from bla
>   where b > 990000 order by a limit 10 ;
> [uses index on b and has a long run time]
 
The problem is that PostgreSQL doesn't have any sense of the
correlation between columns a and b (i.e., they are always equal)
and assumes that it will find enough matching rows soon enough on
the scan of the index on b to make it cheaper than sorting the
results of finding all rows that match the predicate.  Try your test
suite again with the only change being the insert statement:
 
insert into bla ( a , b )
  select a , floor(random() * 1000000) + 1
  from generate_series( 1 , 1000000 ) as a ( a ) ;
 
On my machine, with that data, all of the queries run fast.
 
We've been looking at ways to develop statistics on multiple
columns, so that correlations like that don't confuse the optimizer,
or trying to evaluate the "risk" of a query taking a long time based
on unexpected correlations.
 
Not really a bug; more like a recognized opportunity to improve the
optimizer.
 
-Kevin


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to