On 10/11/2015 03:20 AM, Peter Geoghegan wrote: > On Thu, Sep 3, 2015 at 5:35 PM, David Rowley > <david.row...@2ndquadrant.com> wrote: >> My test cases are: > > Note that my text caching and unsigned integer comparison patches have > moved the baseline down quite noticeably. I think that my mobile > processor out-performs the Xeon you used for this, which seems a > little odd even taken the change in baseline performance into account. >
To add a caveat not yet mentioned, the idea behind prefetching is to scarifice spare memory bandwidth for performance. That can be a winning bet on a quiet box (the one we benchmark on), but can backfire on production db when the extra memory pressure can degrade all running queries. Something to test for, or at least keep in mind. >> set work_mem ='1GB'; >> create table t1 as select md5(random()::text) from >> generate_series(1,10000000); >> >> Times are in milliseconds. Median and average over 10 runs. >> >> Test 1 > I am the reluctant owner of outmoded hardware. Namely a core2 from around 2007 on plain spinning metal. My results (linux 64bit): ------ Test 1 ------ set work_mem ='1GB'; select count(distinct md5) from t1; == Master == 42771.040 ms <- outlier? 41704.570 ms 41631.660 ms 41421.877 ms == Patch == 42469.911 ms <- outlier? 41378.556 ms 41375.870 ms 41118.105 ms 41096.283 ms 41095.705 ms ------ Test 2 ------ select sum(rn) from (select row_number() over (order by md5) rn from t1) a; == Master == 44186.775 ms 44137.154 ms 44111.271 ms 44061.896 ms 44109.122 ms == Patch == 44592.590 ms 44403.076 ms 44276.170 ms very slight difference in an ambiguous direction, but also no perf catastrophe. > It's worth considering that for some (possibly legitimate) reason, the > built-in function call is ignored by your compiler, since GCC has > license to do that. You might try this on both master and patched > builds: > > ~/postgresql/src/backend/utils/sort$ gdb -batch -ex 'file tuplesort.o' > -ex 'disassemble tuplesort_gettuple_common' > prefetch_disassembly.txt > > ... > > Notably, there is a prefetchnta instruction here. > I have verified the prefetech is emitted in the disassembly. An added benefit of owning outmoded hardware is that the MSR for this generation is public and I can disable individual prefetcher units by twiddeling a bit. Disabling the "HW prefetch" or the "DCU prefetch" units on a pacthed version gave results that look relatively unchanged, which seems promising. Disabling them both at once on an unpatched version shows a slowdown of 5-6% in test1 (43347.181, 43898.705, 43399.428). That gives an indication of maximum potential gains in this direction, for this box at least. Finally, I notice my results are 4x slower than everyone else's. That can be very tough on a man's pride, let me tell you. Amir -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers