On 21 September 2011 07:51, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > On 21.09.2011 02:53, Peter Geoghegan wrote: >> >> C stdlib quick-sort time elapsed: 2.092451 seconds >> Inline quick-sort time elapsed: 1.587651 seconds >> >> Does *that* look attractive to you? > > Not really, to be honest. That's a 25% speedup in pure qsorting speed. How > much of a gain in a real query do you expect to get from that, in the best > case? There's so many other sources of overhead that I'm afraid this will be > lost in the noise.
I'm surprised that you're dismissive of this. After all, we have in the past indulged in micro-optimisation of qsort, or so it would seem from this comment: * We have modified their original by adding a check for already-sorted input, * which seems to be a win per discussions on pgsql-hackers around 2006-03-21. "Makes affected queries radically faster" (In the best case, a speedup somewhat greater than 12.5%) is an unreasonably high standard for a performance optimisation of the executor in general (such a high standard might be sensible if it was due to a particular maintainability downside, but you didn't mention one). Even still, I think that the 12.5% figure is pretty pessimistic - I've already sped up the dell store query by almost that much, and that's with a patch that was, due to circumstances, cobbled together. Not only are we benefiting from the effects of inlining, we're also benefiting from the removal of unnecessary indirection. As Tom said, "In concrete terms, there would be no reason to have tuplesort.c's myFunctionCall2Coll, and maybe not inlineApplySortFunction either, if the datatype-specific comparison functions had APIs that were closer to what sorting wants rather than following the general SQL-callable-function API." He was just referring to the benefits of removing indirection here, so ISTM that this is really two performance optimisations rolled into one - it's conceivable that the total performance improvement will even exceed the isolated inlining comparator benchmark. As I've said, I believe this patch can be committed without compromising the maintainability of the tuplesort code to an extent that is not clearly worth it, through the use of a clean, macro-based abstraction. Concerns about bloated binaries are probably not well founded, because what I'm proposing is to a certain extent emulating C++ templates, while using a very common pattern used with C++ templates. In the C++ world, algorithms are often generalised as templates, so that they can be used equally well with any datatype (that supports the interface of the template), while availing of compiler optimisations per template instantiation (instance of using a given type with a given template). I actually got the idea for this patch in part from a book that I read years ago that described the fact that counter-intuitively, std::sort() consistently outperforms qsort(), because the comparator is often inlined, and the compiler can generally avail of optimisations from knowing the comparator at compile-time. On 21 September 2011 13:47, Greg Stark <st...@mit.edu> wrote: > This is pretty easy to measure. Just run oprofile or gprof and see > what percentage of time for a big index build is spent in qsort. I'll do so soon. I intend to get to this on Friday evening, and perhaps have a proper patch to show next week. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers