Ok, I've committed this patch now. Finally, phew!

I think I've addressed all your comments about the comments. I moved some of the comments around: I split up the large one near the top of the file, moving its paragraphs closer to the code where they apply.

Regarding your performance-related worries: you have good thoughts on how to improve things, but let's wait until we see some evidence that there is a problem, before any further optimizations.

I fixed one bug related to aligning the WAL buffers. The patch assumes WAL buffers to be aligned at a full XLOG_BLCKSZ boundary, but did not enforce it. That was already happening on platforms with O_DIRECT, which is why I didn't notice that in testing, but it would've failed on others.

I just remembered one detail that I'm not sure has been mentioned on the mailing list yet. Per the commit message:

This has one user-visible change: switching to a new WAL segment with
pg_switch_xlog() now fills the remaining unused portion of the
segment with zeros. This potentially adds some overhead, but it has
been a very common practice by DBA's to clear the "tail" of the
segment with an external pg_clearxlogtail utility anyway, to make the
WAL files compress better. With this patch, it's no longer necessary
to do that.


I simplified the handling of xlogInsertingAt per discussion, and added the memory barrier to GetXLogBuffer(). I ran again the pgbench tests I did earlier with the now-committed version of the patch (except for some comment changes). The results are here:

http://hlinnaka.iki.fi/xloginsert-scaling/xloginsert-scale-26/

I tested three different workloads. with different numbers of "slots", ranging from 1 to 1000. The tests were run on a 32-core machine, in a VM. As the baseline, I used a fresh checkout from master branch, with this one-line patch: http://www.postgresql.org/message-id/519a938a.1070...@vmware.com. That patch adjusts the spinlock delay loop, which happens to make a big difference on this box. We'll have to revisit and apply that patch separately, but I think that's the correct baseline to test this xloginsert scaling patch against.

nobranch
--------

This is the "pgbench -N" workload. Throughput is mainly limited by flushing the WAL at commits. The patch makes no big difference here, which is good. The point of the test is to demonstrate that the patch doesn't make WAL flushing noticeably more expensive.

nobranch-sync-commit-off
------------------------

Same as above, but with synchronous_commit=off. Here the patch somewhat. WAL insertion doesn't seem to be the biggest limiting factor in this test, but it's nice to see some benefit.

xlogtest
--------

The workload in this test is a single INSERT statement that inserts a lot of rows: "INSERT INTO foo:client_id SELECT 1 FROM generate_series(1,100) a, generate_series(1,100) b". Each client inserts to a separate table, to eliminate as much lock contention as possible, making the WAL insertion bottleneck as serious as possible (although I'm not sure how much difference that makes). This pretty much a best case scenario for this patch.

This test shows a big gain from the patch, as it should. The peak performance goes from about 35 TPS to 100 TPS. With the patch, I suspect the test saturates the I/O subsystem at that point. I think it could go higher with better I/O hardware.


All in all, I'm satisfied enough with this to commit. The default number of insertion slots, 8, seems to work fine for all the workloads on this box. We may have to adjust that or other details later, but what it needs now is more testing by people with different hardware.

Thanks to everyone involved for the review and testing! And if you can, please review the patch as committed once more.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to