On 2014-10-02 20:08:33 -0400, Greg Smith wrote: > I did a fair dive into double-checking the decision to just leave > xloginsert_locks fixed at 8 for 9.4. My conclusion: good call, move along. > Further improvements beyond what the 8-way split gives sure are possible. > But my guess from chasing them a little is that additional places will pop > up as things that must also be tweaked, before you'll see those gains turn > significant.
Thanks for doing this. > I'd like to see that box re-opened at one point. But if we do that, I'm > comfortable that could end with a xloginsert_locks that tunes itself > reasonably on large servers in the end, similar to wal_buffers. There's > nothing about this that makes feel like it needs a GUC. I barely needed an > exposed knob to do this evaluation. > > = Baseline = > > I rolled back a few commits to just before the GUC was removed and tested > against that point in git time. Starting with the 4 client test case Heikki > provided, the fastest runs on my 24 core server looked like this: > > tps = 56.691855 (including connections establishing) > > Repeat runs do need to drop the table and rebuild, because eventually AV > kicks in on things in a big way, and then your test is toast until it's > done. Attached is what I settled on for a test harness. Nothing here was so > subtle I felt a more complicated harness was needed. > > Standard practice for me is to give pgbench more workers when worrying about > any scalability tests. That gives a tiny improvement, to where this is > typical with 4 clients and 4 workers: > > tps = 60.942537 (including connections establishing) > > Increasing to 24 clients plus 24 workers gives roughly the same numbers, > suggesting that the bottleneck here is certainly not the client count, and > that the suggestion of 4 was high enough: > > tps = 56.731581 (including connections establishing) > > Decreasing xloginsert_locks to 1, so back to the original problem, the rate > normally looks like this instead: > > tps = 25.384708 (including connections establishing) > > So the big return you get just fine with the default tuning; great. I'm > happy to see it ship like this as good enough for 9.4. > > = More locks = > > For the next phase, I stuck to 24 clients and 24 workers. If I then bump up > xloginsert_locks to something much larger, there is an additional small gain > to be had. With 24 locks, so basically ever client has their own, instead > of 57-60 TPS, I managed to get as high as this: > > tps = 66.790968 (including connections establishing) > > However, the minute I get into this territory, there's an obvious bottleneck > shift going on in there too. The rate of creating new checkpoint segments > becomes troublesome as one example, with messages like this: > > LOG: checkpoints are occurring too frequently (1 second apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > > When 9.4 is already giving a more than 100% gain on this targeted test case, > I can't see that chasing after maybe an extra 10% is worth having yet > another GUC around. Especially when it will probably take multiple tuning > steps before you're done anyway; we don't really know the rest of them yet; > and when we do, we probably won't need a GUC to cope with them in the end > anyway. I've modified the test slightly, by having the different backends insert into different relations. Even on my measly 5 year old workstation I *do* see quite a bit more than 10%. psql -f /tmp/prepare.sql && pgbench -n -f /tmp/fooinsert.sql -c 64 -j 64 -T 10 on a 2x E5520 server (2 sockets a 4 cores a 2 threads) with the following configuration: -c shared_buffers=2GB -c wal_level=hot_standby -c full_page_writes=off -c checkpoint_segments=400 -c fsync=off (io system here is abysmally bad) -c synchronous_commit=off #define NUM_XLOGINSERT_LOCKS 1 tps = 52.711939 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 8 tps = 286.496054 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 16 tps = 346.113313 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 24 tps = 363.242111 (including connections establishing) I'd not be surprised at all if you'd see bigger influence on a system with 4 sockets. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
insert into foo_:client_id select g from generate_series(1, 10000) g;
CREATE OR REPLACE FUNCTION exec(text) returns text language plpgsql volatile AS $f$ BEGIN EXECUTE $1; RETURN $1; END; $f$; \o /dev/null SELECT exec('drop table if exists foo_'||g.i||'; create table foo_'||g.i||'(id int4);') FROM generate_series(1, 64) g(i); \o CHECKPOINT;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers