On Tue, Aug 4, 2015 at 1:24 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote: > Yeah, something like that. To paraphrase, if I'm now understanding it > correctly, Peter's idea is: > > When all the tuples have been fed to tuplesort, and it's time to perform the > sort, quicksort all the tuples currently in the heap, ignoring the run > numbers, and turn the resulting array into another tape. That tape is > special: it's actually stored completely in memory. It is merged with the > "real" tapes when tuples are returned from the tuplesort, just like regular > tapes in TSS_FINALMERGE.
Yeah. I imagine that we'll want to put memory prefetch hints for the new case, since I've independently shown that that works well for the in-memory case, which this can be very close to. My next patch will also include quicksorting of runs after we give up on heapification (after there is more than one run and it is established that we cannot use my "quicksort with spillover" optimization, so there are two or more "real" runs on tape). Once there is clearly not going to be one huge run (which can happen due to everything largely being in order, even when work_mem is small), and once incrementally spilling does not end in time to do a "quicksort with spillover", then the replacement selection thing isn't too valuable. Especially with large memory sizes but memory bandwidth + latency as a bottleneck, which is the norm these days. This seems simpler than my earlier idea of reusing half the memtuples array only, and resorting the entire array each time, to have something that consistently approximates replacement selection but with quicksorting + batching, which I discussed before. I have this working, and it takes about a good chunk of the runtime off a sort that merges 3 runs on one reasonable case tested where work_mem was 300MB. It went from about 56.6 seconds with master to 35.8 seconds with this new approach when tested just now (this approach saves no writing of tuples, so it's not as effective as the original "quicksort with spillover" patch can be, but covers a fundamentally different case). I just need to clean up the patch, and see if I missed any further optimizations, but this feels like the way forward multi-run wise. I think it's worth giving up on replacement selection style runs after the first run is produced, because that's where the benefits are, if anywhere. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers