On Thu, 4 Sep 2008, Guillaume Cottenceau wrote:
It seems to me that if the correlation is 0.99, and you're
looking for less than 1% of rows, the expected rows may be at the
beginning or at the end of the heap?

Not necessarily. Imagine for example that you have a table with 1M rows, and one of the fields has unique values from 1 to 1M, and the rows are ordered in the table by that field. So the correlation would be 1. If you were to SELECT from the table WHERE the field = 500000 LIMIT 1, then the database should be able to work out that the rows will be right in the middle of the table, not at the beginning or end. It should set the startup cost of a sequential scan to the amount of time required to sequential scan half of the table.

Of course, this does bring up a point - if the matching rows are concentrated at the end of the table, the database could perform a sequential scan backwards, or even a scan from the middle of the table onwards.

This improvement of course only actually helps if the query has a LIMIT clause, and presumably would muck up simultaneous sequential scans.

Matthew

--
Picard: I was just paid a visit from Q.
Riker:  Q! Any idea what he's up to?
Picard: No. He said he wanted to be "nice" to me.
Riker:  I'll alert the crew.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to