> > I could do some tests with the patch on some larger machines. What exact > tests do you propose? Are there some specific postgresql.conf settings and > pgbench initialization you recommend for this? And was the test above just > running 'pgbench -S' select-only with specific -T, -j and -c parameters? >
With Andres' instructions I ran a couple of tests. With your patches I can reproduce a speedup of ~3% on single core tests reliably on a dual-socket 36-core machine for the pgbench select-only test case. When using the full scale test my results are way too noisy even for large runs unfortunately. I also tried some other queries (for example select's that return 10 or 100 rows instead of just 1), but can't see much of a speed-up there either, although it also doesn't hurt. So I guess the most noticeable one is the select-only benchmark for 1 core: <Master> transaction type: <builtin: select only> scaling factor: 300 query mode: prepared number of clients: 1 number of threads: 1 duration: 600 s number of transactions actually processed: 30255419 latency average = 0.020 ms latency stddev = 0.001 ms tps = 50425.693234 (including connections establishing) tps = 50425.841532 (excluding connections establishing) <Patched> transaction type: <builtin: select only> scaling factor: 300 query mode: prepared number of clients: 1 number of threads: 1 duration: 600 s number of transactions actually processed: 31363398 latency average = 0.019 ms latency stddev = 0.001 ms tps = 52272.326597 (including connections establishing) tps = 52272.476380 (excluding connections establishing) This is the one with 40 clients, 40 threads. Not really an improvement, and quite still quite noisy. <Master> transaction type: <builtin: select only> scaling factor: 300 query mode: prepared number of clients: 40 number of threads: 40 duration: 600 s number of transactions actually processed: 876846915 latency average = 0.027 ms latency stddev = 0.015 ms tps = 1461407.539610 (including connections establishing) tps = 1461422.084486 (excluding connections establishing) <Patched> transaction type: <builtin: select only> scaling factor: 300 query mode: prepared number of clients: 40 number of threads: 40 duration: 600 s number of transactions actually processed: 872633979 latency average = 0.027 ms latency stddev = 0.038 ms tps = 1454387.326179 (including connections establishing) tps = 1454396.879195 (excluding connections establishing) For tests that don't use the full machine (eg. 10 clients, 10 threads) I see speed-ups as well, but not as high as the single-core run. It seems there are other bottlenecks (on the machine) coming into play. -Floris