On Wed, 17 Jul 2019 at 11:06, Tom Lane <t...@sss.pgh.pa.us> wrote: > (Actually, I doubt that any of these changes will really move the > performance needle in the real world. It's more a case of wanting > the code to present good examples not bad ones.)
In spirit with the above, I'd quite like to fix a small bad example that I ended up with in nodeAppend.c and nodeMergeappend.c for run-time partition pruning. The code in question performs a loop over a list and checks bms_is_member() on each element and only performs an action if the member is present in the Bitmapset. It would seem much more efficient just to perform a bms_next_member() type loop then just fetch the list item with list_nth(), at least this is certainly the case when only a small number of the list items are indexed by the Bitmapset. With these two loops in particular, when a large number of list items are in the set the cost of the work goes up greatly, so it does not seem unreasonable to optimise the case for when just a few match. A quick test shows that it's hardly groundbreaking performance-wise, but test 1 does seem measurable above the noise. -- Setup plan_cache_mode = force_generic_plan max_locks_per_transaction = 256 create table ht (a int primary key, b int, c int) partition by hash (a); select 'create table ht' || x::text || ' partition of ht for values with (modulus 8192, remainder ' || (x)::text || ');' from generate_series(0,8191) x; \gexec -- Test 1: Just one member in the Bitmapset. test1.sql: \set p 1 select * from ht where a = :p Master: $ pgbench -n -f test1.sql -T 60 -M prepared postgres tps = 297.267191 (excluding connections establishing) tps = 298.276797 (excluding connections establishing) tps = 296.264459 (excluding connections establishing) tps = 298.968037 (excluding connections establishing) tps = 298.575684 (excluding connections establishing) Patched: $ pgbench -n -f test1.sql -T 60 -M prepared postgres tps = 300.924254 (excluding connections establishing) tps = 299.360196 (excluding connections establishing) tps = 300.197024 (excluding connections establishing) tps = 299.741215 (excluding connections establishing) tps = 299.748088 (excluding connections establishing) 0.71% faster -- Test 2: when all list items are found in the Bitmapset. test2.sql: select * from ht; Master: $ pgbench -n -f test2.sql -T 60 -M prepared postgres tps = 12.526578 (excluding connections establishing) tps = 12.528046 (excluding connections establishing) tps = 12.491347 (excluding connections establishing) tps = 12.538292 (excluding connections establishing) tps = 12.528959 (excluding connections establishing) Patched: $ pgbench -n -f test2.sql -T 60 -M prepared postgres tps = 12.503670 (excluding connections establishing) tps = 12.516133 (excluding connections establishing) tps = 12.404925 (excluding connections establishing) tps = 12.514567 (excluding connections establishing) tps = 12.541484 (excluding connections establishing) 0.21% slower With that removed the slowness of test 1 is almost entirely in AcquireExecutorLocks() and ExecCheckRTPerms(). We'd be up close to about 30k tps instead of 300 tps if there was some solution to those problems. I think it makes sense to remove the inefficient loops and leave the just final two bottlenecks, in the meantime. Patch attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
list_fixups_list_nth.patch
Description: Binary data