Re: [HACKERS] Inlining comparators as a performance optimisation

Peter Geoghegan Wed, 21 Sep 2011 07:21:44 -0700

On 21 September 2011 07:51, Heikki Linnakangas
<[email protected]> wrote:
> On 21.09.2011 02:53, Peter Geoghegan wrote:
>>
>> C stdlib quick-sort time elapsed: 2.092451 seconds
>> Inline quick-sort time elapsed: 1.587651 seconds
>>
>> Does *that* look attractive to you?
>
> Not really, to be honest. That's a 25% speedup in pure qsorting speed. How
> much of a gain in a real query do you expect to get from that, in the best
> case? There's so many other sources of overhead that I'm afraid this will be
> lost in the noise.

I'm surprised that you're dismissive of this. After all, we have in
the past indulged in micro-optimisation of qsort, or so it would seem
from this comment:

 * We have modified their original by adding a check for already-sorted input,
 * which seems to be a win per discussions on pgsql-hackers around 2006-03-21.

"Makes affected queries radically faster" (In the best case, a speedup
somewhat greater than 12.5%) is an unreasonably high standard for a
performance optimisation of the executor in general (such a high
standard might be sensible if it was due to a particular
maintainability downside, but you didn't mention one). Even still, I
think that the 12.5% figure is pretty pessimistic - I've already sped
up the dell store query by almost that much, and that's with a patch
that was, due to circumstances, cobbled together.

Not only are we benefiting from the effects of inlining, we're also
benefiting from the removal of unnecessary indirection. As Tom said,
"In concrete terms, there would be no reason to have tuplesort.c's
myFunctionCall2Coll, and maybe not inlineApplySortFunction either, if
the datatype-specific comparison functions had APIs that were closer
to what sorting wants rather than following the general
SQL-callable-function API." He was just referring to the benefits of
removing indirection here, so ISTM that this is really two performance
optimisations rolled into one - it's conceivable that the total
performance improvement will even exceed the isolated inlining
comparator benchmark.

As I've said, I believe this patch can be committed without
compromising the maintainability of the tuplesort code to an extent
that is not clearly worth it, through the use of a clean, macro-based
abstraction. Concerns about bloated binaries are probably not well
founded, because what I'm proposing is to a certain extent emulating
C++ templates, while using a very common pattern used with C++
templates. In the C++ world, algorithms are often generalised as
templates, so that they can be used equally well with any datatype
(that supports the interface of the template), while availing of
compiler optimisations per template instantiation (instance of using a
given type with a given template). I actually got the idea for this
patch in part from a book that I read years ago that described the
fact that counter-intuitively, std::sort() consistently outperforms
qsort(), because the comparator is often inlined, and the compiler can
generally avail of optimisations from knowing the comparator at
compile-time.

On 21 September 2011 13:47, Greg Stark <[email protected]> wrote:
> This is pretty easy to measure. Just run oprofile or gprof and see
> what percentage of time for a big index build is spent in qsort.

I'll do so soon. I intend to get to this on Friday evening, and
perhaps have a proper patch to show next week.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

Reply via email to