On Wed, 17 Jul 2019 at 11:06, Tom Lane <t...@sss.pgh.pa.us> wrote:
> (Actually, I doubt that any of these changes will really move the
> performance needle in the real world.  It's more a case of wanting
> the code to present good examples not bad ones.)

In spirit with the above, I'd quite like to fix a small bad example
that I ended up with in nodeAppend.c and nodeMergeappend.c for
run-time partition pruning.

The code in question performs a loop over a list and checks
bms_is_member() on each element and only performs an action if the
member is present in the Bitmapset.

It would seem much more efficient just to perform a bms_next_member()
type loop then just fetch the list item with list_nth(), at least this
is certainly the case when only a small number of the list items are
indexed by the Bitmapset. With these two loops in particular, when a
large number of list items are in the set the cost of the work goes up
greatly, so it does not seem unreasonable to optimise the case for
when just a few match.

A quick test shows that it's hardly groundbreaking performance-wise,
but test 1 does seem measurable above the noise.

-- Setup
plan_cache_mode = force_generic_plan
max_locks_per_transaction = 256

create table ht (a int primary key, b int, c int) partition by hash (a);
select 'create table ht' || x::text || ' partition of ht for values
with (modulus 8192, remainder ' || (x)::text || ');' from
generate_series(0,8191) x;
\gexec

-- Test 1: Just one member in the Bitmapset.

test1.sql:
\set p 1
select * from ht where a = :p

Master:

$ pgbench -n -f test1.sql -T 60 -M prepared postgres
tps = 297.267191 (excluding connections establishing)
tps = 298.276797 (excluding connections establishing)
tps = 296.264459 (excluding connections establishing)
tps = 298.968037 (excluding connections establishing)
tps = 298.575684 (excluding connections establishing)

Patched:

$ pgbench -n -f test1.sql -T 60 -M prepared postgres
tps = 300.924254 (excluding connections establishing)
tps = 299.360196 (excluding connections establishing)
tps = 300.197024 (excluding connections establishing)
tps = 299.741215 (excluding connections establishing)
tps = 299.748088 (excluding connections establishing)

0.71% faster

-- Test 2: when all list items are found in the Bitmapset.

test2.sql:
select * from ht;

Master:

$ pgbench -n -f test2.sql -T 60 -M prepared postgres
tps = 12.526578 (excluding connections establishing)
tps = 12.528046 (excluding connections establishing)
tps = 12.491347 (excluding connections establishing)
tps = 12.538292 (excluding connections establishing)
tps = 12.528959 (excluding connections establishing)

Patched:

$ pgbench -n -f test2.sql -T 60 -M prepared postgres
tps = 12.503670 (excluding connections establishing)
tps = 12.516133 (excluding connections establishing)
tps = 12.404925 (excluding connections establishing)
tps = 12.514567 (excluding connections establishing)
tps = 12.541484 (excluding connections establishing)

0.21% slower

With that removed the slowness of test 1 is almost entirely in
AcquireExecutorLocks() and ExecCheckRTPerms(). We'd be up close to
about 30k tps instead of 300 tps if there was some solution to those
problems. I think it makes sense to remove the inefficient loops and
leave the just final two bottlenecks, in the meantime.

Patch attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment: list_fixups_list_nth.patch
Description: Binary data

Reply via email to