Currently, our sort algorithm assumes that its input is unsorted. So if your data is sorted on (a) and you would like it to be sorted on (a,b) then we need to perform the full sort of (a,b).
For small sorts this doesn't matter much. For larger sorts the heap sort algorithm will typically result in just a single run being written to disk which must then be read back in. Number of I/Os required is twice the total volume of data to be sorted. If we assume we use heap sort, then if we *know* that the data is presorted on (a) then we should be able to emit tuples directly that the value of (a) changes and keep emitting them until the heap is empty, since they will exit the heap in (a,b) order. The normal pattern of a sort node is to read all of its input and then begin emitting rows, so the sort is completed before the first row is returned (OK, lets gloss over the final merge bit for now). So what I'm suggesting is a sort node that will switch back and forth between consuming and emitting rows. In some cases it *may* still need to scroll to disk, but we already have that code. In this way we can completely avoid writing data to disk for some types of large sort, even when we have a data set much larger than available memory. Of course it only applies when we are refining an existing well-known sort order. This is a similar concept to the way we handled dynamic buffering of data for the merge join in 8.3, so I expect it to have an equally warm reception ;-) -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers