On Fri, Nov 18, 2011 at 3:53 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > We have no proof that we need to do this for 10 or 100 data types. We > only currently have proof that there is gain for the most common > types.
Well, that's kind of my point. I think this needs more work before we decide what the best approach is. So far, the ONLY test result we have that compares inlining to not inlining shows a speedup from 60 ms to 52 ms. I think that an 8 ms speedup on one test with one datatype on one platform/compiler combination isn't sufficient evidence to conclude that this is the best possible approach. I think the way to look at this is that this patch contains two possibly good ideas: one of which is to make the qsort() argument match the qsort() calling convention, and the other of which is to have multiple copies of the qsort() logic that inline the comparators for their respective datatypes. Tom hypothesized that most of the benefit was in the first idea, and the numbers that Peter posted seem to support that conclusion. The first idea is also less invasive and more broadly applicable, so to me that seems like the first thing to pursue. Now, that doesn't mean that we shouldn't consider the second idea as well, but I don't believe that the evidence presented thus far is sufficient to prove that we should go that route. It seems entirely possible that inlining any non-trivial comparator function could work out to a loss, or that the exact choice of compiler flags could affect whether inlining works out to a plus or a minus (what happens with -O3 vs. -O2? what about -O1? what about -O2 -fno-omit-frame-pointer? what if they're using HP's aCC or Intel's icc rather than gcc?). There's no point in installing an optimization that could easily be a pessimization on some other workload, and that hasn't been tested, or at least no results have been posted here. On the other hand, matching the calling convention of the comparator function to what qsort() wants and eliminating the trampoline seems absolutely certain to be a win in every case, and based on the work Peter has done it seems like it might be a quite large win. In fact, you have to ask yourself just exactly how much our function-calling convention is costing us in general. We use that mechanism an awful lot and whatever loss of cycles is involved would be spread all over the code base where oprofile or gprof won't easily be able to pick it out. Even the cost at any particular call site will be split between the caller and the callee. There might be more stuff we could optimize here than just sorting... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers