On Wed, Sep 10, 2014 at 11:36 AM, Robert Haas <robertmh...@gmail.com> wrote: > No, not really. All you have to do is right a little test program to > gather the information.
I don't think a little test program is useful - IMV it's too much of a simplification to suppose that a strcoll() has a fixed cost, and a memcmp() has a fixed cost, and that we can determine algebraically that we should (say) proceed or not proceed with the additional opportunistic "memcmp() == 0" optimization based solely on that. I'm not sure if that's what you meant, but it might have been. Temporal locality is surely a huge factor here, for example. Are we talking about a memcmp() that always immediately precedes a similar strcoll() call on the same memory? Are we factoring in the cost of NUL-termination in order to make each strcoll() possible? And that's just for starters. However, I think it's perfectly fair to consider a case where the opportunistic "memcmp() == 0" thing never works out (as opposed to mostly not helping, which is what I considered earlier [1]), as long as we're sorting real tuples. You mentioned Heikki's test case; it seems fair to consider that, but for the non-abbreviated case where the additional, *totally* opportunistic "memcmp == 0" optimization only applies (so no abbreviated keys), while still having the additional optimization be 100% useless. Clearly that test case is also just about perfectly pessimal for this case too. (Recall that Heikki's test case shows performance on per-sorted input, so there are far fewer comparisons than would typically be required anyway - n comparisons, or a "bubble sort best case". If I wanted to cheat, I could reduce work_mem so that an external tape sort is used, since as it happens tapesort doesn't opportunistically check for pre-sorted input, but I won't do that. Heikki's case both emphasizes the amortized cost of a strxfrm() where we abbreviate, and in this instance de-emphasizes the importance of memory latency by having access be sequential/predictable.) The only variation I'm adding here to Heikki's original test case is to have a leading int4 attribute that always has a value of 1 -- that conveniently removes abbreviation (including strxfrm() overhead) as a factor that can influence the outcome, since right now that isn't under consideration. So: create table sorttest (dummy int4, t text); insert into sorttest select 1, 'foobarfo' || (g) || repeat('a', 75) from generate_series(10000, 30000) g; Benchmark: pg@hamster:~/tests$ cat heikki-sort.sql select * from (select * from sorttest order by dummy, t offset 1000000) f; pgbench -f heikki-sort.sql -T 100 -n With optimization enabled ==================== tps = 77.861353 (including connections establishing) tps = 77.862260 (excluding connections establishing) tps = 78.211053 (including connections establishing) tps = 78.212016 (excluding connections establishing) tps = 77.996117 (including connections establishing) tps = 77.997069 (excluding connections establishing) With optimization disabled (len1 == len2 thing is commented out) ================================================= tps = 78.719389 (including connections establishing) tps = 78.720366 (excluding connections establishing) tps = 78.764819 (including connections establishing) tps = 78.765712 (excluding connections establishing) tps = 78.472902 (including connections establishing) tps = 78.473844 (excluding connections establishing) So, yes, it looks like I might have just about regressed this case - it's hard to be completely sure. However, this is still a very unrealistic case, since invariably "len1 == len2" without the optimization ever working out, whereas the case that benefits [2] is quite representative. As I'm sure you were expecting, I still favor pursuing this additional optimization. If you think I've been unfair or not thorough, I'm happy to look at other cases. Also, I'm not sure that you accept that I'm justified in considering this a separate question to the more important question of what to do in the tie-breaker abbreviation case (where we may be almost certain that equality will be indicated by a memcmp()). If you don't accept that I'm right about that more important case, I guess that means that you don't have confidence in my ad-hoc cost model (the HyperLogLog/cardinality stuff). [1] http://www.postgresql.org/message-id/CAM3SWZR9dtGO+zX4VEn7GTW2=+umSNq=c57sjgxg8oqhjar...@mail.gmail.com [2] http://www.postgresql.org/message-id/cam3swzqtyv3kp+cakzjzv3rwb1ojjahwpcz9coyjxpkhbtc...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers