Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Heikki Linnakangas Tue, 06 Sep 2016 00:10:08 -0700

On 08/16/2016 03:33 AM, Peter Geoghegan wrote:

I attach a patch that changes how we maintain the heap invariant
during tuplesort merging. I already mentioned this over on the
"Parallel tuplesort, partitioning, merging, and the future" thread. As
noted already on that thread, this patch makes merging clustered
numeric input about 2.1x faster overall in one case, which is
particularly useful in the context of a serial final/leader merge
during a parallel CREATE INDEX. Even *random* non-C-collated text
input is made significantly faster. This work is totally orthogonal to
parallelism, though; it's just very timely, given our discussion of
the merge bottleneck on this thread.


Nice!

The patch makes tuplesort merging shift down and displace the root
tuple with the tape's next preread tuple, rather than compacting and
then inserting into the heap anew. This approach to maintaining the
heap as tuples are returned to caller will always produce fewer
comparisons overall. The new approach is also simpler. We were already
shifting down to compact the heap within the misleadingly named [2]
function tuplesort_heap_siftup() -- why not instead just use the
caller tuple (the tuple that we currently go on to insert) when
initially shifting down (not the heap's preexisting last tuple, which
is guaranteed to go straight to the leaf level again)? That way, we
don't need to enlarge the heap at all through insertion, shifting up,
etc. We're done, and are *guaranteed* to have performed less work
(fewer comparisons and swaps) than with the existing approach (this is
the reason for my optimism about getting this stuff out of the way
early).


Makes sense.

This new approach is more or less the *conventional* way to maintain
the heap invariant when returning elements from a heap during k-way
merging. Our existing approach is convoluted; merging was presumably
only coded that way because the generic functions
tuplesort_heap_siftup() and tuplesort_heap_insert() happened to be
available. Perhaps the problem was masked by unrelated bottlenecks
that existed at the time, too.

Yeah, this seems like a very obvious optimization. Is there a standardname for this technique in the literature? I'm OK with "displace", orperhaps just "replace" or "siftup+insert", but if there's a standardname for this, let's use that.


- Heikki



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Reply via email to