Re: [PERFORM] Air-traffic benchmark

Matthew Wakeling Thu, 07 Jan 2010 08:24:14 -0800

On Thu, 7 Jan 2010, Gurgel, Flavio wrote:

If one single query execution had a step that brought a page to thebuffercache, it's enough to increase another step speed and change theexecution plan, since the data access in memory is (usually) faster thendisk.

Postgres does not change a query plan according to the shared_bufferssetting. It does not anticipate one step contributing to another step inthis way. It does however make use of the effective_cache_size setting toestimate this effect, and that does affect the planner.

The use of the index over seqscan has to be tested. I don't agree in 50%gain, since simple integers stored on B-Tree have a huge possibility ofbeeing retrieved in the required order, and the discarded data will bediscarder quickly too, so the gain has to be measured.
I bet that an index scan will be a lot faster, but it's just a bet :)

In a situation like this, the opposite will be true. If you were accessinga very small part of a table, say to order by a field with a small limit,then an index can be very useful by providing the results in the correctorder. However, in this case, almost the entire table has to be read.Changing the order in which it is read will mean that the disc access isno longer sequential, which will slow things down, not speed them up.The Postgres planner isn't stupid (mostly), there is probably a goodreason why it isn't using an index scan.

The table is very wide, which is probably why the tested databases can
deal with it faster than PG. You could try and narrow the table down
(for instance: remove the Div* fields) to make the data more
"relational-like". In real life, speedups in this circumstances would
probably be gained by normalizing the data to make the basic table
smaller and easier to use with indexing.

Ugh. I don't think so. That's why indexes were invented. PostgreSQL issmart enough to "jump" over columns using byte offsets.

A better option for this table is to partition it in year (or year/month) 
chunks.

Postgres (mostly) stores the columns for a row together with a row, sowhat you say is completely wrong. Postgres does not "jump" over columnsusing byte offsets in this way. The index references a row in a page ondisc, and that page is fetched separately in order to retrieve the row.The expensive part is physically moving the disc head to the right part ofthe disc in order to fetch the correct page from the disc - jumping overcolumns will not help with that at all.

Reducing the width of the table will greatly improve the performance of asequential scan, as it will reduce the size of the table on disc, andtherefore the time taken to read the entire table sequentially.

Moreover, your suggestion of partitioning the table may not help much withthis query. It will turn a single sequential scan into a UNION of manytables, which may be harder for the planner to plan. Also, for queriesthat access small parts of the table, indexes will help more thanpartitioning will.

Partitioning will help most in the case where you want to summarise asingle year's data. Not really otherwise.


Matthew

--
Q: What's the difference between ignorance and apathy?
A: I don't know, and I don't care.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Air-traffic benchmark

Reply via email to