I did a fair dive into double-checking the decision to just leave xloginsert_locks fixed at 8 for 9.4. My conclusion: good call, move along. Further improvements beyond what the 8-way split gives sure are possible. But my guess from chasing them a little is that additional places will pop up as things that must also be tweaked, before you'll see those gains turn significant.

I'd like to see that box re-opened at one point. But if we do that, I'm comfortable that could end with a xloginsert_locks that tunes itself reasonably on large servers in the end, similar to wal_buffers. There's nothing about this that makes feel like it needs a GUC. I barely needed an exposed knob to do this evaluation.

= Baseline =

I rolled back a few commits to just before the GUC was removed and tested against that point in git time. Starting with the 4 client test case Heikki provided, the fastest runs on my 24 core server looked like this:

tps = 56.691855 (including connections establishing)

Repeat runs do need to drop the table and rebuild, because eventually AV kicks in on things in a big way, and then your test is toast until it's done. Attached is what I settled on for a test harness. Nothing here was so subtle I felt a more complicated harness was needed.

Standard practice for me is to give pgbench more workers when worrying about any scalability tests. That gives a tiny improvement, to where this is typical with 4 clients and 4 workers:

tps = 60.942537 (including connections establishing)

Increasing to 24 clients plus 24 workers gives roughly the same numbers, suggesting that the bottleneck here is certainly not the client count, and that the suggestion of 4 was high enough:

tps = 56.731581 (including connections establishing)

Decreasing xloginsert_locks to 1, so back to the original problem, the rate normally looks like this instead:

tps = 25.384708 (including connections establishing)

So the big return you get just fine with the default tuning; great. I'm happy to see it ship like this as good enough for 9.4.

= More locks =

For the next phase, I stuck to 24 clients and 24 workers. If I then bump up xloginsert_locks to something much larger, there is an additional small gain to be had. With 24 locks, so basically ever client has their own, instead of 57-60 TPS, I managed to get as high as this:

tps = 66.790968 (including connections establishing)

However, the minute I get into this territory, there's an obvious bottleneck shift going on in there too. The rate of creating new checkpoint segments becomes troublesome as one example, with messages like this:

LOG:  checkpoints are occurring too frequently (1 second apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".

When 9.4 is already giving a more than 100% gain on this targeted test case, I can't see that chasing after maybe an extra 10% is worth having yet another GUC around. Especially when it will probably take multiple tuning steps before you're done anyway; we don't really know the rest of them yet; and when we do, we probably won't need a GUC to cope with them in the end anyway.

--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/
#!/bin/bash
CLIENTS=4
psql postgres -c "drop table if exists foo"
psql postgres  -c "create table foo (id int4)"
pgbench postgres -n -f fooinsert.sql -c $CLIENTS -j $CLIENTS -T10
insert into foo select g from generate_series(1, 10000) g; 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to