On Thu, 7 Jan 2010, Gurgel, Flavio wrote:
If one single query execution had a step that brought a page to the buffercache, it's enough to increase another step speed and change the execution plan, since the data access in memory is (usually) faster then disk.

Postgres does not change a query plan according to the shared_buffers setting. It does not anticipate one step contributing to another step in this way. It does however make use of the effective_cache_size setting to estimate this effect, and that does affect the planner.

The use of the index over seqscan has to be tested. I don't agree in 50% gain, since simple integers stored on B-Tree have a huge possibility of beeing retrieved in the required order, and the discarded data will be discarder quickly too, so the gain has to be measured.

I bet that an index scan will be a lot faster, but it's just a bet :)

In a situation like this, the opposite will be true. If you were accessing a very small part of a table, say to order by a field with a small limit, then an index can be very useful by providing the results in the correct order. However, in this case, almost the entire table has to be read. Changing the order in which it is read will mean that the disc access is no longer sequential, which will slow things down, not speed them up. The Postgres planner isn't stupid (mostly), there is probably a good reason why it isn't using an index scan.

The table is very wide, which is probably why the tested databases can
deal with it faster than PG. You could try and narrow the table down
(for instance: remove the Div* fields) to make the data more
"relational-like". In real life, speedups in this circumstances would
probably be gained by normalizing the data to make the basic table
smaller and easier to use with indexing.

Ugh. I don't think so. That's why indexes were invented. PostgreSQL is smart enough to "jump" over columns using byte offsets.
A better option for this table is to partition it in year (or year/month) 
chunks.

Postgres (mostly) stores the columns for a row together with a row, so what you say is completely wrong. Postgres does not "jump" over columns using byte offsets in this way. The index references a row in a page on disc, and that page is fetched separately in order to retrieve the row. The expensive part is physically moving the disc head to the right part of the disc in order to fetch the correct page from the disc - jumping over columns will not help with that at all.

Reducing the width of the table will greatly improve the performance of a sequential scan, as it will reduce the size of the table on disc, and therefore the time taken to read the entire table sequentially.

Moreover, your suggestion of partitioning the table may not help much with this query. It will turn a single sequential scan into a UNION of many tables, which may be harder for the planner to plan. Also, for queries that access small parts of the table, indexes will help more than partitioning will.

Partitioning will help most in the case where you want to summarise a single year's data. Not really otherwise.

Matthew

--
Q: What's the difference between ignorance and apathy?
A: I don't know, and I don't care.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to