"Simon Riggs" <[EMAIL PROTECTED]> writes: > On Mon, 2007-12-03 at 10:32 -0800, Jeff Davis wrote: > >> If I understand correctly, we just keep track of the maximum value of >> the last block read from each run, and then always read from the run in >> which the last block read has the lowest maximum.
So it sounds like the use case where this is the biggest win would be a situation where you have presorted input which has been sliced up. So for example sorting by "zip code" in a table which was clustered by city. The alphabetic order of the cities isn't correlated to the results but all the zip codes for a city are in a contiguous block somewhere in the output. In such a case after doing a single pass we would have a bunch of tapes each of which corresponded to a single city and was able to completely reorder the zip codes in that city to be ordered. So the desired results would be, for example, all the tuples from tape 17 (NYC) followed by all the tuples from tape 3 (Buffalo) followed by all the tuples from tape 1 (Albuquerque), etc. We currently preread an equal amount from each tape and then would empty all the preread tuples from tape 17, refill them, preread them again, repeat until tape 17 is empty then move on to tape 3. All the tuples except the currently active tape are completely idle. I think the way to do what you're proposing is to preread one tuple from each tape, then when one preread bunch is emptied refill it with twice as many and repeat. In this case you would end up with nearly all of workmem full of tuples from NYC until you're done with NYC. That would increase the prereading block size by a factor of 20 in this case. So the question is just how many seeks are we doing during sorting. If we're doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely (which we can't do) isn't going to speed up seeking all that much. If we're doing 20% seeks and can get that down to 10% it might be worthwhile. I'm not sure where the idea of keeping the current bounds of the input tapes comes into it. We only preread when we run out of tuples anyways and then we don't really have a choice about which tape we want to preread from. And it's a good thing too since maintaining such a list of bounds and finding the lowest or highest would mean maintaining a second heap which would basically double the cpu cost of sorting. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate