On 08/16/2016 03:33 AM, Peter Geoghegan wrote:
I attach a patch that changes how we maintain the heap invariant during tuplesort merging. I already mentioned this over on the "Parallel tuplesort, partitioning, merging, and the future" thread. As noted already on that thread, this patch makes merging clustered numeric input about 2.1x faster overall in one case, which is particularly useful in the context of a serial final/leader merge during a parallel CREATE INDEX. Even *random* non-C-collated text input is made significantly faster. This work is totally orthogonal to parallelism, though; it's just very timely, given our discussion of the merge bottleneck on this thread.
Nice!
The patch makes tuplesort merging shift down and displace the root tuple with the tape's next preread tuple, rather than compacting and then inserting into the heap anew. This approach to maintaining the heap as tuples are returned to caller will always produce fewer comparisons overall. The new approach is also simpler. We were already shifting down to compact the heap within the misleadingly named [2] function tuplesort_heap_siftup() -- why not instead just use the caller tuple (the tuple that we currently go on to insert) when initially shifting down (not the heap's preexisting last tuple, which is guaranteed to go straight to the leaf level again)? That way, we don't need to enlarge the heap at all through insertion, shifting up, etc. We're done, and are *guaranteed* to have performed less work (fewer comparisons and swaps) than with the existing approach (this is the reason for my optimism about getting this stuff out of the way early).
Makes sense.
This new approach is more or less the *conventional* way to maintain the heap invariant when returning elements from a heap during k-way merging. Our existing approach is convoluted; merging was presumably only coded that way because the generic functions tuplesort_heap_siftup() and tuplesort_heap_insert() happened to be available. Perhaps the problem was masked by unrelated bottlenecks that existed at the time, too.
Yeah, this seems like a very obvious optimization. Is there a standard name for this technique in the literature? I'm OK with "displace", or perhaps just "replace" or "siftup+insert", but if there's a standard name for this, let's use that.
- Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers