On Fri, Jul 31, 2015 at 12:59 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote: > On 07/31/2015 02:01 AM, Peter Geoghegan wrote: >> >> What prevents the tuple at the top of the in-memory heap at the point >> of tuplesort_performsort() (say, one of the ones added to the heap as >> our glut of memory was*partially* consumed) being less than the >> last/greatest tuple on tape? If the answer is "nothing", a merge step >> is clearly required. > > > Oh, ok, I was confused on how the heap works.
I think I explained this badly, by referencing a secondary reason why we must do a merge. I will now do a better job of explaining why a merge of in-memory and on disk tuples is necessary, for the benefit of other people (I think you get it). The main reason why a merge step is required is that the memtuples array will contain some tuples that were classified as belonging to a second run. Therefore, those tuples may well be lower than the highest on-tape tuples in terms of sort order (in fact, they may be lower than any on-tape tuple). I cannot simply return all tape tuples followed by all in-memory tuples to the caller, and so I must merge, and so only !randomAccess callers may get a "quicksort with spillover". I can only get away with **avoiding dumping all tuples** and just merging instead because I "reject" this determination that a second *traditional/tape* run is needed. I am therefore free of any obligation to merge this would-be traditional second run separately. Another way of explaining it is that I do an all-in-memory merge of some part of the first run, and all the second run (by quicksorting). I then merge this with the original chunk of the first run that is sorted on tape (that was sorted by incremental spilling from the heap). The next version of the patch (the patch may be split in two -- "quicksort with spillover", and "merge sort" optimization) will make sure that any comparisons that go into maintaining the heap invariant are not wasted on the second run, since it will always be quicksorted. We only need to compare the second run tuples pre-quicksort in order to determine that they belong to that run and not the current (first) run. Does that make sense? Have I explained that well? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers